Open matentzn opened 5 years ago
@cmungall @balhoff @dosumis
Doesn't EBI have a server API that addresses this?
https://github.com/EBISPOT/urigen
It keeps register of IDs used - so doesn't need to look through TSV s & OWL files to check.
Don't think there's a live service anymore. We looked into running this for VFB, but IIRC, we couldn't get it working out of the box. Think it needs some updating.
One problem I continuously stumble over now with the new data pipelines for some of the organisms is the problem of novel ids. I think we need to be able to leave the defined_class column in the DOSDP TSV file empty, and have a process that, whenever a field was left blank, populates it with a fresh id from a given id range before compiling the pattern to OWL. The reason why I wanted to stick the issue on the ODK repo rather than DOSDP is because I am not sure we want to overload DOSDP having to be forcibly aware of the already occupied id space - that could be scattered across multiple modules and/or TSV files.
My suggestion is to have a simple python script shipping with the odk that loops through all the TSV files, and, when happening over an empty field, drawing a new ID. What I am not sure yet is where and how the various id ranges should be stored (say one for patterns, one for manual).