buda-base / owl-schema

BDRC Ontology Schema
11 stars 2 forks source link

Local root of ontology files #169

Closed MarcAgate closed 3 years ago

MarcAgate commented 4 years ago

@xristy We find this in ont-policy-local : <!ENTITY local 'file:///Users/chris/git/'> but I think we shouldn't have that, just things like:

<altURL rdf:resource="ext/user/budauser.ttl"/>

i.e that altUrl should be relative to ont-policy-local.rdf.

In normal (outside china) service, we just pule the repo and load from ont-policy-local.rdf, so any machine/local specific should be avoided.

WDYT ?

xristy commented 4 years ago

I guess you're referring to:

<altURL    rdf:resource="&local;owl-schema/ext/user/budauser.ttl"/>

from lds-pdi #188. (why a new issue here if this is a continuation of #188?)

The OntDocumentManager needs to know what the root or base of the files is for relative URLs in <altURL/>.

The !ENTITY, &local; establishes the root for retrieval of local copies, as does &git; in ont-policy.rdf give the root for accessing from github. These entities are simple substitutions in the rdf:resource="&local; . . ." strings so there is only a single place where the root need be defined. The root has to be defined somewhere in the policy file.

The ont-policy-local.rdf is intended to apply to China and to development and testing scenarios.

The ont-policy.rdf is intended to be the production policy.

I tried relying on the &base; which is used in OntDocumentManager to determine the prefix for relative URLs but in order to do so I had to change the current definition from:

<!ENTITY base    '&jena;2003/03/ont-manager'>
<!ENTITY ont     '&base;#'>

which is used in:

<rdf:RDF xmlns="&ont;" xml:base="&base;">

which establishes the empty prefix and the base: prefix.

I tried various ways of defining xmlns="..." and so on, but was not able to find a way that loaded the files and didn't outright break the ODM.

Since the ODM needs to know what the root of the files is for relative URLs in <altURL/>, I thought to simplify the body of the policy file by defining &local; and &git; defined at the top of the files without having to change the various <altURL/> throughout the body of the file.

BTW, as mentioned in my last post in lds-pdi #188, only a single altURL per OntologySpec is used per <OntologySpec/>. What I didn't mention is that the <altURL/> is tried before the OntDocumentManager tries the <publicURI/>.

wwwt?

xristy commented 4 years ago

I just now tried an experiment of changing the <altURL/> to be relative to the location of the policy file. The idea being that maybe ODM would try referring to the relative paths in the same directory as the policy file. But that fails. The ODM tries to use a root path derived ultimately from:

<!ENTITY jena    'http://jena.hpl.hp.com/schemas/'>

which leads to:

'java.net.UnknownHostException: jena.hpl.hp.com: unknown error'

The specifics were to try exactly what @MarcAgate wants:

<altURL rdf:resource="ext/user/budauser.ttl"/>

instead of what used to be defined:

<altURL    rdf:resource="owl-schema/ext/user/budauser.ttl"/>

In other words the policy file came from /xxx/owl-schema/ont-policy-local.rdf and so it was possible that ODM might use the directory for the policy as a root for relative <altURL/> as with a web browser using as root the path to the file in which the relative link occurs.

Unfortunately this doesn't work, but might be a possible new feature idea for jena.

@MarcAgate has the idea of odm.getFileManager().addLocatorFile("/root/path") which is outside the ont-policy-local.rdf.

I've tried:

    private static final String ROOT = "/Users/chris/git/";
    private static void initOdm() {
        FileManager fm = FileManager.get().clone(); // the global FileManager
        logger.info("FileManager: {}", fm);
        fm.addLocatorFile(ROOT);
        oms = new OntModelSpec(OntModelSpec.OWL_MEM);        
        odm = new OntDocumentManager(fm, ONT_POLICY);        
        oms.setDocumentManager(odm);
        writeTtl(fm.getLocationMapper().toModel(), "LOCATOR_FM_LOCAL09");
        fm.addLocatorFile(ROOT);
        System.exit(0);
    }

and other minor variations, to no avail. The fm.getLocationMapper().toModel() always yields:

@prefix lmap:  <http://jena.hpl.hp.com/2004/08/location-mapping#> .

[ lmap:mapping  [ lmap:altName  "http://jena.hpl.hp.com/schemas/2003/03/owl-schema/adm/legal_entities.ttl" ;
                  lmap:name     "http://purl.bdrc.io/ontology/adm/LegalData/"
                ]
] .

[ lmap:mapping  [ lmap:altName  "http://jena.hpl.hp.com/schemas/2003/03/owl-schema/roles/creators.ttl" ;
                  lmap:name     "http://purl.bdrc.io/ontology/roles/Creator/"
                ]
] .

. . .
xristy commented 4 years ago

Modifying the ROOT in OntTestLoading4 as:

private static final String ROOT = "file://Users/chris/git/owl-schema";

with

    private static void initOdm() {
        FileManager fm = FileManager.get().clone(); // the global FileManager
        fm.addLocatorFile(ROOT);
        oms = new OntModelSpec(OntModelSpec.OWL_MEM);        
        odm = new OntDocumentManager(fm, ONT_POLICY);        
        oms.setDocumentManager(odm);
        writeTtl(fm.getLocationMapper().toModel(), "LOCATOR_FM_LOCAL09");
    }    

works with the just committed ont-policy-local.rdf as can be seen from LocationMapper model and OntTestLoading4.

MarcAgate commented 4 years ago

Thanks Chris. I am glad we've found a clean solution to this.

xristy commented 4 years ago

Unfortunately when I try the exact same pattern w/ editor-templates/ont-policy-local.rdf it fails because the fm.getLocationMapper() shows that the URL locator has been applied:

@prefix lmap:  <http://jena.hpl.hp.com/2004/08/location-mapping#> .

[ lmap:mapping  [ lmap:altName  "http://jena.hpl.hp.com/schemas/2003/03/templates/core/person.local.shapes.ttl" ;
                  lmap:name     "http://purl.bdrc.io/shapes/core/PersonLocalShapes/"
                ]
] .

[ lmap:mapping  [ lmap:altName  "http://jena.hpl.hp.com/schemas/2003/03/templates/core/base.shapes.ttl" ;
                  lmap:name     "http://purl.bdrc.io/shapes/core/BaseShapes/"
                ]
] .

instead of the file locator and then the

'java.net.UnknownHostException: jena.hpl.hp.com: unknown error'

occurs so the ttl files aren't loaded.

I'll commit what I have for editor-templates/ont-policy-local.rdf and the additions to OntTestLoading4.

@MarcAgate maybe you can spot what's happening but it might be the order of the locators on the handler list.

xristy commented 4 years ago

Finally, an actual solution that makes sense and confirms @MarcAgate intuition at the start of this issue.

There is now a single ont-policy.rdf for each of owl-schema and editor-templates. The unneeded ont-policy-local.rdf files are now deleted.

So now all that lds-pdi needs are configuration items for the location of the desired ont-policy.rdf files for whatever ontologies are to be served.

Nothing special is need to get the correct files loaded:

    private static void initOdm() {
        oms = new OntModelSpec(OntModelSpec.OWL_MEM);        
        odm = new OntDocumentManager(ONT_POLICY);
        oms.setDocumentManager(odm);
    }    

The ONT_POLICY just refers to the ont-policy.rdf wherever that is, like:

String ONT_POLICY = "/Users/chris/git/owl-schema/ont-policy.rdf";

or

String ONT_POLICY = "https://raw.githubusercontent.com/buda-base/owl-schema/master/ont-policy.rdf";

and as long as the files mentioned in <altURL/> are relative to the path containing the ont-policy.rdf, like:

<altURL    rdf:resource="adm/types/license_types.ttl"/>

then all just works:

The issue has been the definitions:

<!ENTITY base    '&jena;2003/03/ont-manager'>
<rdf:RDF
    . . .
    xml:base  ="&base;"
    . . .
    >

which I changed to: <rdf:RDF . . . xml:base ="" . . .

And now relative <altURL/> are properly relative to the path to the ont-policy.rdf, regardless of whether it is a file or http(s) location. It was the result of a typo in owl-schema/ont-policy-local.rdf that led to the solution. The original definition of xml:base was inherited from a generic policy file.

A complete set of tests for the four cases are in shapes-testing: OntTestLoading5_xxxx.

eroux commented 4 years ago

excellent, thanks!

eroux commented 4 years ago

just to be sure, why don't we remove

<!ENTITY git     'https://raw.githubusercontent.com/buda-base/'>

it doesn't seem to be used anyway

eroux commented 4 years ago

@MarcAgate or @xristy can you fix https://github.com/buda-base/xmltoldmigration/blob/master/src/main/java/io/bdrc/xmltoldmigration/MigrationHelpers.java#L662 ? I tried to use the method in your test (which BTW uses the FileManager which BTW will be deprecated) but it creates a Model instead of an OntModel and I didn't find an obvious way to work around that

xristy commented 4 years ago

just to be sure, why don't we remove

<!ENTITY git     'https://raw.githubusercontent.com/buda-base/'>

it doesn't seem to be used anyway

It will be removed.

xristy commented 4 years ago

(which BTW uses the FileManager which BTW will be deprecated)

The use of FileManager was only to print out information about how the LocationMapper had functioned. It's not relevant to the initialization of odm and oms

regarding the [deprecation](), it is somewhat nuanced, Andy comments:

Long term, FileManager can be removed from general use. It is used by the OntDocumentManager so making solely for that purpose, maybe moving it to the "ont" sub-system.

xristy commented 4 years ago

Testing w/ OntTestLoading5_ONTS_RES_CT indicates that the LocatorClassLoader finds the ONT_POLICY (== "owl-schema/ont-policy.rdf" in the jar, but the LocationMapper improperly uses the path to wherever the command java -jar path/test.jar is run as the root for the various relative <altURL/>.

This is as far as I can get w/o using something like @MarcAgate implemented in OntTestLoading5_ONTS_RES to retrieve all from jar. I'll post a query about this on users@jena.

It does seem to me that we do not need to put a version of the schema into the jar for xmltold (which Élie has removed for now) or gittodbs. There is/will be a git system on the ZJU server (or a dev system) so fetching from there with the already running local code should quite sufficient. I can add a command line param to gittodbs so that the default of fetching from GH can be overridden as needed (perhaps for a different branch or for use on an isolated system such as ZJU). This would the same pattern as with lds-pdi having config params for where to locate the ontologies.

Does this make sense?

eroux commented 4 years ago

I think reporting a bug to Jena and adding a cli parameter to gittodb would solve the problem yes

MarcAgate commented 4 years ago

we could actually have all git repos under a rootGit directory (/etc/buda/git/ for instance), either on prod or in ZJU server instance or any instance. We would create a "git" user group for this and share git repos across an instance. This way we can always use the OntDocumentManager with an absolute path for Ont-policy.rdf. Right now, ldspdi is the only service using git. Editserv will follow.

WDYT ?

eroux commented 4 years ago

oh I think it's not even necessary, all that is the stuff that's currently on buda2, it's quite manual and all under one user

eroux commented 3 years ago

is this an issue we can close? @MarcAgate @xristy ?