biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
108 stars 48 forks source link

Additional chemistry prefixes motivated by ZINC #562

Open cthoyt opened 1 year ago

cthoyt commented 1 year ago

The ZINC web application lists a huge amount of vendors and catalogs for its chemicals. We could create many new prefixes based on these by following the instructions at https://github.com/biopragmatics/bioregistry/blob/main/docs/CONTRIBUTING.md#content-contribution (either by submitting a new prefix via the GitHub issue tracker's new prefix request form or by making a PR directly).

Example

For example, see the page for caffeine at https://zinc15.docking.org/substances/ZINC000000001084/

Screen Shot 2022-09-14 at 13 46 23
cthoyt commented 1 year ago

@StroemPhi you want to help with this by adding some new prefixes to the bioregistry to better support the chemical sciences?

StroemPhi commented 1 year ago

@cthoyt you mean as in using this as the source for the new prefixes that need yet to be added by using the new prefix issue template and then making commits for each ressource listed in this Zinc DB that is not yet in src/bioregistry/data/bioregistry.json?

cthoyt commented 1 year ago

Either would work, but I think the new prefix request form is the best place to start since you don't have to know anything about the actual structure of this JSON document. Is this something you might be interested in doing?

https://github.com/biopragmatics/bioregistry/issues/new?assignees=biopragmatics%2Fbioregistry-reviewers&labels=New%2CPrefix&template=new-prefix.yml&title=Add+prefix+%5BX%5D

Sort of related, I was going to use the ols-client integrations with the TIB and NFDI4chem to assess what prefixes aren't already represented in Bioregistrry which could also prioritize more curation.

StroemPhi commented 1 year ago

I would give it a try for one, let's say TargetMol NP, to see how it works and then I would need to discuss with Oliver how to use Bioregistry in out pipelines and how to prioritize my work time. But if I guess, right and filling out the template properly is all I need to do, I guess helping to add unknown prefixes should be something I can do in between here and there.

cthoyt commented 1 year ago

But if I guess, right and filling out the template properly is all I need to do, I guess helping to add unknown prefixes should be something I can do in between here and there.

That's all we need to keep a healthy community resource! A curation a day keeps the link rot away :)

StroemPhi commented 1 year ago

wrt this example of Caffein in ZINC, what if the associated links don't resolve, as in http://www.prestwickchemical.comprestw-1256/ or https://www.targetmol.com/search?keyword=BBP01882, should such prefixes be ignored?

cthoyt commented 1 year ago

For sure things in ZINC might point to websites that don't exist anymore or whose link formats are out of date. Bioregistry does accept new prefixes corresponding to resources that are dead, because you still might see these references appearing in other places (e.g., ZINC). Sometimes you can spend a few minutes to look into the site and figure out a new URI format string, too