biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
120 stars 53 forks source link

Add NIF prefixes #404

Closed cthoyt closed 2 years ago

cthoyt commented 2 years ago

NIF has tons of prefixes listed in here: https://raw.githubusercontent.com/SciCrunch/NIF-Ontology/master/ttl/generated/NIFSTD-ILX-mapping.ttl. Let's try and get them all in to Bioregistry so people can better understand this chaos (se also https://www.youtube.com/watch?v=3tM0Sow-2r8)

Originally posted by @matentzn in https://github.com/biopragmatics/bioregistry/issues/402#issuecomment-1141117750

matentzn commented 2 years ago

@tgbugs could you help us sort this issue out?

Could add a half-sentence to each of these to explain what they are? Are they all necessary?

NIFEXT: http://uri.neuinfo.org/nif/nifstd/nifext_ . NIFSTD: http://uri.neuinfo.org/nif/nifstd/ . NLX: http://uri.neuinfo.org/nif/nifstd/nlx_ . NLXANAT: http://uri.neuinfo.org/nif/nifstd/nlx_anat_ . NLXBR: http://uri.neuinfo.org/nif/nifstd/nlx_br_ . NLXCELL: http://uri.neuinfo.org/nif/nifstd/nlx_cell_ . NLXCHEM: http://uri.neuinfo.org/nif/nifstd/nlx_chem_ . NLXDYS: http://uri.neuinfo.org/nif/nifstd/nlx_dys_ . NLXFUNC: http://uri.neuinfo.org/nif/nifstd/nlx_func_ . NLXINV: http://uri.neuinfo.org/nif/nifstd/nlx_inv_ . NLXMOL: http://uri.neuinfo.org/nif/nifstd/nlx_mol_ . NLXOEN: http://uri.neuinfo.org/nif/nifstd/oen_ . NLXORG: http://uri.neuinfo.org/nif/nifstd/nlx_organ_ . NLXQUAL: http://uri.neuinfo.org/nif/nifstd/nlx_qual_ . NLXRES: http://uri.neuinfo.org/nif/nifstd/nlx_res_ . NLXSUB: http://uri.neuinfo.org/nif/nifstd/nlx_subcell_ .

This is necessary to organise our cross references on Uberon and standardise them across the community :)

Thank you!

cthoyt commented 2 years ago

Is @tgbugs the contact person for NIF? What's the difference between NIFEXT and NIFSTD? Do we need one for NLX or are all terms covered by the sub-terminologies?

matentzn commented 2 years ago

@tgbugs (Tom) is contributing to a lot of different ontology projects and has been on the Uberon tracker often as well; I think he is the person to talk to about the NIF ontologies.

tgbugs commented 2 years ago

Yes, I'm the best point of contact for the NIF-Ontology. @smtifahim is also back with us and was around for the early days of the NIF-Ontology and neurolex.

NIFSTD is the top level, kind of like OBO is for obo foundry ontologies, NIFEXT covers a subset of iris that were "external" identifiers that were brought into the ontology at some point in time, this was done before most of the current standard ontology and identifier management practices had been developed.

With regard to the other prefixes. In the early days of the NIF-Ontology there were separate files for major entity categories, and their identifier prefixes were usually matched to the file. This carried over to the early days of neurolex where individual entities were given type specific identifiers, this covers all the NLX* style identifiers. At a certain point neurolex switched to using a single identifier sequence which did not differentiate between types, that is NLX.

These are all needed but we are not minting new identifiers in these namespaces. There usually aren't that many identifiers for any given NLX* prefix, but their curied form may have been referenced by someone without retaining the expansion rule, so having them for the record is important.

NIFEXT: http://uri.neuinfo.org/nif/nifstd/nifext_ .      -> external
NIFSTD: http://uri.neuinfo.org/nif/nifstd/ .             -> base (like obo:)
NLX: http://uri.neuinfo.org/nif/nifstd/nlx_ .            -> generic neurolex, covers all types 
NLXANAT: http://uri.neuinfo.org/nif/nifstd/nlx_anat_ .   -> anatomy terms
NLXBR: http://uri.neuinfo.org/nif/nifstd/nlx_br_ .       -> brain regions
NLXCELL: http://uri.neuinfo.org/nif/nifstd/nlx_cell_ .   -> cell types
NLXCHEM: http://uri.neuinfo.org/nif/nifstd/nlx_chem_ .   -> chemicals
NLXDYS: http://uri.neuinfo.org/nif/nifstd/nlx_dys_ .     -> dysfunction
NLXFUNC: http://uri.neuinfo.org/nif/nifstd/nlx_func_ .   -> cognitive function
NLXINV: http://uri.neuinfo.org/nif/nifstd/nlx_inv_ .     -> investigations
NLXMOL: http://uri.neuinfo.org/nif/nifstd/nlx_mol_ .     -> molecules
NLXOEN: http://uri.neuinfo.org/nif/nifstd/oen_ .         -> the version of oen terms in neurolex
NLXORG: http://uri.neuinfo.org/nif/nifstd/nlx_organ_ .   -> organ terms
NLXQUAL: http://uri.neuinfo.org/nif/nifstd/nlx_qual_ .   -> qualities
NLXRES: http://uri.neuinfo.org/nif/nifstd/nlx_res_ .     -> digital resources
NLXSUB: http://uri.neuinfo.org/nif/nifstd/nlx_subcell_ . -> subcellular entities e.g. GOCC
matentzn commented 2 years ago

@cthoyt whats the best course of action? Given we also have an entry for obo, shouldn't there be one nifstd as well? https://bioregistry.io/registry/obo

In any case how do we add all these prefixes, do you have a form that takes a table for bulk submission?

cthoyt commented 2 years ago

No, the obo prefix in bioregistry is a mistake and I keep forgetting to remove it.

cthoyt commented 2 years ago

Bulk contribution guidelines: https://github.com/biopragmatics/bioregistry/blob/main/docs/CONTRIBUTING.md#bulk-contribution

matentzn commented 2 years ago

I started preparing this, its all I can do now:

https://docs.google.com/spreadsheets/d/10MPt-H6My33mOa1V_VkLh4YG8609N7B_Dey0CBnfTL4/edit?usp=sharing

matentzn commented 2 years ago

@tgbugs I attributed it all to you, would you mind filling in the red cells?

cthoyt commented 2 years ago

@matentzn @tgbugs in the mean time I implemented the code necessary to suck this google sheet up in #407. Would appreciate an update on this - I want to get these in ASAP

cthoyt commented 2 years ago

@matentzn @tgbugs I just browsed through https://raw.githubusercontent.com/SciCrunch/NIF-Ontology/master/ttl/generated/NIFSTD-ILX-mapping.ttl to fill out all of the example identifiers and guessed what the patterns should be, but I can't be sure that this file is a complete picture of these vocabularies so input would be appreciated.

The final 3 action items:

  1. Need improved descriptions - most of the notes were quite vague and wouldn't be helpful for someone who is looking at the Bioregistry prefix page for the first time. A good description explains what kind of entities the prefix covers, how it's different from related ones, and how it fits into the broader resource that it's part of. For NIF resources specifically, I am under the impression that each of these namespaces exists to duplicate some external resource like GO or ChEBI - is that correct? Can you make explicit a list of the resources that these cover? I left specific notes about some of the more opaque ones as well.
  2. Need to double check the identifier patterns and potentially add more example local identifiers (especially for NIFSTD, which I guess will have a lot more variety)
  3. Need confirmation that we can redistribute @tgbugs' email address in the bioregistry
tgbugs commented 2 years ago

With regard to 2 there are indeed quite a few other namespaces that live under NIFSTD, to give only one example BIRNLEX. I'm fairly certain that this is the full list but there might be a lurker or two.

cthoyt commented 2 years ago

If it's just a namespace that contains other namespaces, we can skip it for now.

tgbugs commented 2 years ago

Unfortunately there indeed some lurkers e.g. NIFSTD:FMA_83604 and NIFSTD:OBI_0000470, and it does appear in other resources as well, e.g. uberon xrefs. Those would be terms that are not otherwise differentiated but that are managed inside of the NIF Ontology, it is sort of a fall through.

tgbugs commented 2 years ago

With regard to 1, most of namespaces preceded their community ontology counterparts, for example SAO was the original source for many of the GOCC terms. In other cases it is as you describe, where an external id was pulled into neurolex and its fragment was retained as is.

matentzn commented 2 years ago

@tgbugs thanks a ton for weighing in on the prefix metadata! Looks great. Can I ask one additional favour from you: Would you be able for the description to rewrite to a full English sentence like: "The X namespace covers entities of type X and Y, and is used for Z."

I think this would really help users to understand more quickly whats up with entities in these domains! Would you be willing to do that?

tgbugs commented 2 years ago

@matentzn updated. Let me know if they look ok.

matentzn commented 2 years ago

Thank you @tgbugs!