INCATools / ontology-development-kit

Bootstrap an OBO Library ontology
http://incatools.github.io/ontology-development-kit/
BSD 3-Clause "New" or "Revised" License
219 stars 54 forks source link

Improve ID management when using DOSDP #128

Open matentzn opened 5 years ago

matentzn commented 5 years ago

One problem I continuously stumble over now with the new data pipelines for some of the organisms is the problem of novel ids. I think we need to be able to leave the defined_class column in the DOSDP TSV file empty, and have a process that, whenever a field was left blank, populates it with a fresh id from a given id range before compiling the pattern to OWL. The reason why I wanted to stick the issue on the ODK repo rather than DOSDP is because I am not sure we want to overload DOSDP having to be forcibly aware of the already occupied id space - that could be scattered across multiple modules and/or TSV files.

My suggestion is to have a simple python script shipping with the odk that loops through all the TSV files, and, when happening over an empty field, drawing a new ID. What I am not sure yet is where and how the various id ranges should be stored (say one for patterns, one for manual).

matentzn commented 5 years ago

@cmungall @balhoff @dosumis

balhoff commented 5 years ago

Doesn't EBI have a server API that addresses this?

dosumis commented 5 years ago

https://github.com/EBISPOT/urigen

It keeps register of IDs used - so doesn't need to look through TSV s & OWL files to check.

Don't think there's a live service anymore. We looked into running this for VFB, but IIRC, we couldn't get it working out of the box. Think it needs some updating.