Closed teslajoy closed 6 years ago
Come up with no more than 3 potentials to try in order, else stop and put source=Sage
.
Check potential sources in EBI OLS first, then go from there.
Following up on @kdaily:
3 chemical databases i'd recommend: NIH PubChem: https://pubchem.ncbi.nlm.nih.gov/compound/2818 RSC ChemSpider: http://www.chemspider.com/Chemical-Structure.10442628.html?rid=a247ce03-cc7b-4e99-a46c-7fa932730468 NIST WebBook: http://webbook.nist.gov/cgi/cbook.cgi?Name=clozapine&Units=SI
@allaway is the king of drug definitions. 👍 on my behalf.
Who else uses drugs and wants to comment on that?
If no one disagrees on the sources or their ordering, this should be documented in the CONTRIBUTING.md
file (yet to be created - this could be the first entry!)
One more good resource that is more specifically drug-focused and less chemical-focused is DrugBank: https://www.drugbank.ca/drugs/DB00363
I've always perceived DrugBank to be more closely curated, but as a result it has fewer molecules.
For population variants/drug association https://www.pharmgkb.org/ Example: https://www.pharmgkb.org/search?connections&gaSearch=clozapine&query=clozapine
Full list of NIH funded databases https://epi.grants.cancer.gov/pharm/gen-resources.html
@teslajoy @kdaily circling back on this as I am pulling some drug name descriptions for annotations and finding that PubChem and ChemSpider - while great sources of chemical data - aren't a good source for human-readable descriptions of molecules. My strategy now is to use OLS first (as with before), and then use MeSH as a fallback. eg: https://www.ncbi.nlm.nih.gov/mesh/67585785
I'm finding that OLS works better than expected - about 70-80% of drug queries have an OLS hit that is satisfactory, typically from NCIT.
For molecules that have been just discovered or have been minimally investigated, typically the only good source is the original publication.
@allaway Please summarize this in the new contributing doc that @kdaily is creating.
Here is the summary. Ready to drop it into the doc template whenever it's ready (@kdaily):
The preferred first-pass strategy for chemical name annotation is to search the EMBL-EBI ontology lookup service to find names, descriptions, and sources. Typically, the NCI Thesaurus will provide a suitable description for drugs and other biologically active molecules. In situations where the query molecule is not found in EMBL-EBI OLS, MeSH (https://meshb.nlm.nih.gov/) is a helpful secondary location to find chemical descriptions.
Example:
{
"value": "DEFACTINIB",
"description": "An orally bioavailable, small-molecule focal adhesion kinase (FAK) inhibitor with potential antiangiogenic and antineoplastic activities.",
"source": "http://purl.obolibrary.org/obo/NCIT_C79809"
},
In situations where novel molecules (such as newly-synthesized research compounds or proprietary pharmaceutical molecules) require annotation, the only suitable description and source might be the paper describing the synthesis or discovery, or information from the pharmaceutical company that created the identifier.
Example:
{
"value": "IPC-12345",
"description": "An small-molecule target of importance 4 (TOI4) inhibitor with potential antineoplastic activities.",
"source": "Important Pharma Company"
},
{
"value": "BestChemist-00913",
"description": "An investigational small molecule discovered by Best Chemist et al.",
"source": "PubMed Link Goes Here"
},
It is unclear to me how this should be integrated into the contributing doc. Would like to discuss today @kdaily @sgosline
@kdaily @teslajoy This was finalized and is in master, so I think we probably don't need to discuss tomorrow. 💥
This topic initiated from #231
ex. https://pubchem.ncbi.nlm.nih.gov/compound/clozapine