Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

Use PubChem as the primary identifier for Compounds #380

Open lparsons opened 2 years ago

lparsons commented 2 years ago

FEATURE DESCRIPTION

Feature Inspiration

HMDB is a very useful database of human metabolites, however, researchers sometimes measure or infuse compounds that are not considered naturally occurring human metabolites (e.g. ???). Using PubChem as the required, primary identification for a Compound record would allow TraceBase to handle a larger variety of compounds. Links to other databases such as HMDB can be added if available for convenience.

Feature Description

Make PubChem ID a required field for Compound. Change HMDB id to be an optional value.

Alternatives Considered

An alternative would be to use the SMILES key as the primary identifier (mostly hidden from user). Pubchem lists the SMILES id for compounds (for example: https://pubchem.ncbi.nlm.nih.gov/compound/6287#section=Canonical-SMILES).

The advantage would be that a compound not in pubchem could be integrated into the database. I doubt there are many of these. The only possibility might be pharmacologic compounds that researchers administer to animals.

Comment

Add any other context or screenshots about the feature request here.


ISSUE OWNER SECTION

Assumptions

Requirements

Limitations

Affected/Changed Components

DESIGN

GUI Change description

Describe changes the user will see.

Code Change Description (Pseudocode optional)

Describe code changes planned for the feature.

Tests

A test should be planned for each requirement (above), where possible.

mneinast commented 2 years ago

An alternative could be to use the SMILES key as the primary identifier (mostly hidden from user). https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system. Pubchem lists the SMILES id for compounds (for example: https://pubchem.ncbi.nlm.nih.gov/compound/6287#section=Canonical-SMILES).

The advantage would be that a compound not in pubchem could be integrated into the database. I doubt there are many of these. The only possibility might be pharmacologic compounds that researchers administer to animals.

I think Pubchem could be fine, but I wanted to share this idea for an alternative.

lparsons commented 2 years ago

From @hepcat72: Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI