TranslatorSRI / Babel

Babel creates cliques of equivalent identifiers across many biomedical vocabularies.
MIT License
8 stars 2 forks source link

reconsider name choices #158

Open cbizon opened 1 year ago

cbizon commented 1 year ago

At the moment we choose a preferred label by following the label of the preferred prefix. Eg. if we're looking at a chemical, we take pubchem's label. But sometimes this leads to ugly names. Perhaps we should find a way to choose a nicer name, e.g. https://github.com/NCATSTranslator/Feedback/issues/259#issuecomment-1605140850

gaurav commented 11 months ago

Here is the priority order used by MolePro when choosing chemical names: https://github.com/broadinstitute/molecular-data-provider/blob/b13f566911ee8bf7a88361734c245ce9aa26f3b5/MoleProDB/builder/conf/sourcePriority.txt (thanks, @vdancik!)

gaurav commented 9 months ago

See https://github.com/NCATSTranslator/Feedback/issues/568 for an example.

gaurav commented 9 months ago

At least some of our long names are coming from PUBCHEM.COMPOUND. For example, PUBCHEM.COMPOUND:3420 has equivalent identifiers:

      {
        "identifier": "PUBCHEM.COMPOUND:3420",
        "label": "4-Cyclohexyl-1-[2-[(2-methyl-1-propanoyloxypropoxy)-(4-phenylbutyl)phosphoryl]acetyl]pyrrolidine-2-carboxylic acid"
      },
      {
        "identifier": "CHEMBL.COMPOUND:CHEMBL4078476",
        "label": "CHEMBL4078476"
      },
      {
        "identifier": "CAS:1910773-95-3"
      },
      {
        "identifier": "HMDB:HMDB0252464",
        "label": "Fosenopril"
      },
      {
        "identifier": "INCHIKEY:BIDNLKIUORFRQP-UHFFFAOYSA-N"
      }

I think we should push PUBCHEM.COMPOUND below HMDB in the priority list.