Open cbizon opened 1 year ago
Here is the priority order used by MolePro when choosing chemical names: https://github.com/broadinstitute/molecular-data-provider/blob/b13f566911ee8bf7a88361734c245ce9aa26f3b5/MoleProDB/builder/conf/sourcePriority.txt (thanks, @vdancik!)
See https://github.com/NCATSTranslator/Feedback/issues/568 for an example.
At least some of our long names are coming from PUBCHEM.COMPOUND. For example, PUBCHEM.COMPOUND:3420 has equivalent identifiers:
{
"identifier": "PUBCHEM.COMPOUND:3420",
"label": "4-Cyclohexyl-1-[2-[(2-methyl-1-propanoyloxypropoxy)-(4-phenylbutyl)phosphoryl]acetyl]pyrrolidine-2-carboxylic acid"
},
{
"identifier": "CHEMBL.COMPOUND:CHEMBL4078476",
"label": "CHEMBL4078476"
},
{
"identifier": "CAS:1910773-95-3"
},
{
"identifier": "HMDB:HMDB0252464",
"label": "Fosenopril"
},
{
"identifier": "INCHIKEY:BIDNLKIUORFRQP-UHFFFAOYSA-N"
}
I think we should push PUBCHEM.COMPOUND below HMDB in the priority list.
At the moment we choose a preferred label by following the label of the preferred prefix. Eg. if we're looking at a chemical, we take pubchem's label. But sometimes this leads to ugly names. Perhaps we should find a way to choose a nicer name, e.g. https://github.com/NCATSTranslator/Feedback/issues/259#issuecomment-1605140850