geneontology / minerva

BSD 3-Clause "New" or "Revised" License
6 stars 8 forks source link

document mapping between model URIs and filenames, and model/individual IRI generation #51

Open cmungall opened 8 years ago

cmungall commented 8 years ago

Presumably minerva uses a deterministic scheme to extract filenames from model URIs - we should document this

(not relevant when we switch from GH to triplestore)

We should also document the scheme minerva uses to

balhoff commented 8 years ago

The model ID prefix can be passed to Minerva startup via --model-id-prefix. This has a built-in default of http://model.geneontology.org/ in StartUpTool. The filename is extracted from the model URI by removing the prefix, in FileBasedMolecularModelManager.getOwlModelFile.

Model and individual URIs seem to be generated using that prefix and CoreMolecularModelManager.generateId. This uses an incremented long initially based on the start up time.

I'm not sure what other assumptions there are. I haven't come across any code requiring that URIs for individuals need to follow this format. But why not just use UUIDs when generating new URIs? Seems much simpler than maintaining the incrementing counter.

balhoff commented 8 years ago

Recommendation on where to document this?

kltm commented 8 years ago

There were a few reasons that UUIDs were not used early on, with the current system getting into the works as nothing better was put in its place by the time we needed to hit the road. (I'd also like to have hostname, IP, time, etc. hashed in as well, but here we are.) I think that a proper UUID would probably be fine, but what we looked at early on (or something along these lines) was not guaranteed to be unique for identical machines started up at the same time, or some other entropy issue. Practically, the counter, and (I believe time is hashed in there?) the relative date, has been very nice for debugging the system--being able to see what order things were created in after the fact has been useful. There are some assumptions higher in the stack about the way some of the URLs work, but the core numbers/string being used are fungible.