Closed schymane closed 5 years ago
I will take care of this.
And I would like to give a short update about a related topic: I curated all records with any structural information available to contain proper InChI and InChI-Keys. There are just 900 records left which dont have structural information, just chemical names.
Great! Can you post a list somewhere of the 900, with basic details like name, accession etc? Some of them are "tentative", but I am not sure we have that many ... I would be curious ... Thanks!
noStructure.txt The list of all records without a Structure given.
Oh interesting ... so the EawagAdditional are ones that almost certainly don't have a structure because they are tenative records ... but I see a lot from BS, Fac_Eng_Univ_Tokyo (major culprit) and even IPB Halle! @sneumann should be able to comment about the latter ... do you see a systematic issue (one critical identifier missing that we could fill in with other information available) with BS and Fac_Eng_Univ_Tokyo?
There are roughly 60 records with other database identifier, like CAS, which I could use to retrieve proper chemical information. The remaining records have only chemical names. Needs manual lookup and might be unsuccessful in some cases. This will take some time...
Different topic: Please could someone explain the difference between DTXCID and DTXSID? The code for adding COMPTOX id is nearly finished.
C = compound/chemical and S = substance. The "C" entries are the unique chemical (~~ "MS-ready" forms (put simply)) and the "S" entries are the official database entry. Effectively we should always use and link via the substance identifier, the DTXSID
Check out infoboxes here (@ChemConnector note inconsistencies in the DTXCID!) https://comptox.epa.gov/dashboard/dsstoxdb/batch_search
Sorry, didnt understand this concept.
On pubchem we have SID which is something like the label on a bottle with chemicals and could potentially be a mixture and we have CID which is a uniqe compound which is represented by exactly one formula(like you would draw on a paper).
Thats why more questions: Does this mean that there might be several DTXSID for one InChI-Key? Is there a 1 to n relation between DTXCID and DTXSID like in pubchem?
As far as I'm aware it's a one DTXSID per InChIKey. The service should return us one DTXSID for one InChIKey request and this is what @ChemConnector asked us to do, use InChIKey to DTXSID to add these identifiers to MassBank .. (therefore I'm assuming this is the most robust way in his opinion and from my experience, I'd agree)
One DTXSID may have multiple DTXCIDs associated with it. It's a bit different to the PubChem construct. imho we should not yet try mapping on DTXCIDs as they don't have the full functionality associated with them like the DTXSIDs, until recently they were hidden entirely.
Some examples: https://comptox.epa.gov/dashboard/dsstoxdb/results?search=nicotine https://comptox.epa.gov/dashboard/dsstoxdb/ms_ready_mixture?cid=28128
https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID10858175 This one has two DTXCIDs associated with it:
I have created a program which can add these identifier with the help of the InChI-key to DTXSID resolver and have processed all records. We have now 39962 outlinks in place. This program can be executed on all new records and also on a regular basis on the existing records. I think this one can be closed.
Reopen until #68 is solved.
@ChemConnector has added additional services that might be of interest. NOTE that these actor-based web services will be switched off next year and replaced with CompTox ones once they are up and running.
Data Source: dsstox v02
https://ni.epa.gov/actorws/dsstox/v02/msready?identifier=80-05-7 https://ni.epa.gov/actorws/dsstox/v02/msready.json?identifier=80-05-7 https://ni.epa.gov/actorws/dsstox/v02/msready.xml?identifier=80-05-7
https://ni.epa.gov/actorws/dsstox/v02/msready?identifier=DTXCID60513 https://ni.epa.gov/actorws/dsstox/v02/msready.json?identifier=DTXCID60513 https://ni.epa.gov/actorws/dsstox/v02/msready.xml?identifier=DTXCID60513
https://ni.epa.gov/actorws/dsstox/v02/msready?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N https://ni.epa.gov/actorws/dsstox/v02/msready.json?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N https://ni.epa.gov/actorws/dsstox/v02/msready.xml?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N
https://ni.epa.gov/actorws/dsstox/v02/qsar?identifier=80-05-7 https://ni.epa.gov/actorws/dsstox/v02/qsar.json?identifier=80-05-7 https://ni.epa.gov/actorws/dsstox/v02/qsar.xml?identifier=80-05-7
https://ni.epa.gov/actorws/dsstox/v02/qsar?identifier=DTXCID60513 https://ni.epa.gov/actorws/dsstox/v02/qsar.json?identifier=DTXCID60513 https://ni.epa.gov/actorws/dsstox/v02/qsar.xml?identifier=DTXCID60513
https://ni.epa.gov/actorws/dsstox/v02/qsar?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N https://ni.epa.gov/actorws/dsstox/v02/qsar.json?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N https://ni.epa.gov/actorws/dsstox/v02/qsar.xml?identifier=UVOFGKIRTCCNKG-UHFFFAOYSA-N
The hyperlinks to MS Ready and QSAR Ready forms are added the resolver service.
Note that if the cause of the problem is the web services return also up to Level 6, if the "curation level" would be in the data retrieved, we could proactively fix our end by only including DTXSIDs if the level is 5 or lower. I can't see that this information is included yet tho, just following the links above - although I thought this was part of the plan @ChemConnector ?
@meier-rene @Treutler the EPA have set up a basic service that should allow retrieval of DTXSIDs by InChIKey, can you look into implementing this on the database end to add DTXSIDs to all records with matching entries for now, I will post a separate issue to get this into RMassBank and linked up in MassBank-web. It's already in our Record format as CH$LINK: COMPTOX DTXSID50274017 (https://github.com/MassBank/MassBank-web/blob/master/Documentation/MassBankRecordFormat.md)
https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve.json?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N https://actorws.epa.gov/actorws/chemIdentifier/v01/resolve.xml?identifier=IKHGUXGNUITLKF-UHFFFAOYSA-N
Any feedback re service to @ChemConnector
Thanks!