Closed schymane closed 11 months ago
Hi Emma, This is a great enhancement. Using the source data source is always the best option.
Have a nice weekend, Tobi
getPcId will also return non-live CIDs for non-standard tautomeric forms. We can fix this using a function in RChemMass, but we should make sure we upgrade the getPcId function to automatically do this. On the "todo" list ...
> getPcId("NPZTUJOABDZTLV-UHFFFAOYSA-N")
[1] 2759291
> getPCIDs.CIDtype(getPcId("NPZTUJOABDZTLV-UHFFFAOYSA-N"),type="preferred")
[1] 135399369
@meowcat I will need to upgrade getPcId to make sure it returns "live" CIDs, this is fine - but where can I check whether we grab PubChem CIDs from PubChem vs CTS? If we already use getPcId (not CTS), then all I will need to do is fix that function, then this issue is solved. Thanks.
So:
https://github.com/MassBank/RMassBank/createMassBank.R
Line 577 is where we call gatherPubChem
to get data off PubChem.
https://github.com/MassBank/RMassBank/blob/611b78578b54156119080b57569c09586a18fe84/R/createMassBank.R#L577
Lines 602..608 is where we get CTS data. To do: what do we still need from CTS at this point?
Lines 775-786 is where we decide which PubChem ID to use. I guess you want to drop CTS completely as an option? https://github.com/MassBank/RMassBank/blob/611b78578b54156119080b57569c09586a18fe84/R/createMassBank.R#L775-L786
Then the actual data retrieval from PubChem is in gatherPubChem, where getPcId is called: https://github.com/MassBank/RMassBank/blob/611b78578b54156119080b57569c09586a18fe84/R/createMassBank.R#L454-L466
getPcId is then the function in webAccess,R: https://github.com/MassBank/RMassBank/blob/611b78578b54156119080b57569c09586a18fe84/R/webAccess.R#L109-L144
@meowcat I've created a new branch: https://github.com/MassBank/RMassBank/tree/preferredPCIDs
I've added getPCIDs.CIDtype, adjusted getPcId and createMassBank.R (and updated my old email address). I'm stuck on the documentation - see emails.
Just pushed https://github.com/MassBank/RMassBank/commit/445b43243036b18aad3c5343260652eb4945f9cf thanks to @MaliRemorker for docs tips. @meowcat pls let me know if I should do a pull request (this results in a lot of changes), or if you want me to change anything?
Default source of CIDs is PubChem, so could be closed.
We should make our default source of CIDs PubChem, and not CTS. There are too many discrepancies/error cropping up. @meier-rene we may have to check the "status" of CIDs during validation, to catch and fix.
Example from freshly-created infolist: https://pubchem.ncbi.nlm.nih.gov/compound/4644