biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
110 stars 48 forks source link

Capturing OECD harmonized template terms #991

Closed bgyori closed 8 months ago

bgyori commented 8 months ago

Not sure how to handle this one but it's potentially a useful entry.

The Organisation for Economic Co-operation and Development (OECD) makes available harmonized templates for reporting data on chemical tests: https://www.oecd.org/ehs/templates/, e.g., health effects: https://www.oecd.org/ehs/templates/harmonised-templates-health-effects.htm. Each entry here is a Word document with numbered rows. These are referred to as identifiers (e.g., 74.186) in some contexts. Sadly, it seems like these documents are versioned and the row numbers can change across versions. One possibility is to add a placeholder prefix for this resource without the ability to resolve identifiers.

cthoyt commented 8 months ago

I guessed based on your example 74.186 that this corresponds to document 74 in the health effects page: https://www.oecd.org/ehs/templates/OHT%2074%20-%20ENDPOINT_STUDY_RECORD.DevelopmentalToxicityTeratogenicity_v10.2%20-Jul2023.docx. I can't seem to find the 186 part, though. Does that appear in the word document?

I don't see an issue with making a placeholder prefix, even if these can't be directly resolved. It's still valuable to describe the context about what these identifiers are and the source they come from.

If desired, we could try to bulk process the word documents and generate an OBO document e.g., using PyOBO and auto-generating a site in https://github.com/biopragmatics/providers