OpenSourceMalaria / OSM_To_Do_List

Action Items in the Open Source Malaria Consortium
82 stars 13 forks source link

Getting into PubChem #528

Open cdsouthan opened 7 years ago

cdsouthan commented 7 years ago

AWAK the single most important step towards findability and openness is to surface all our novel structures in PubChem, ideally linked to bioactivity. Commendably, the project has already partially achieved this but indirectly, via primary depositions into ChEMBL that are then submitted to PubChem. For example https://www.ebi.ac.uk/chembl/doc/inspect/CHEMBL3137547 eventually came through as the 118 compounds in BioAssay in April 2015 https://pubchem.ncbi.nlm.nih.gov/bioassay/1079930. A more recent small set of 20 with some OSM inclusions have come though as https://www.ebi.ac.uk/chembl/doc/inspect/CHEMBL3832881 via Community for Open Antimicrobial Drug Discovery but not independantly indexed BioAssay (i.e. in PubChem but not as a selectable set). The disadvantage of ChEMBL submissions are the long release times of ~6 months https://www.ebi.ac.uk/chembl/downloads and limited synonym selection (but this could be fixed).

The current set of 289 Series 4 InChIKey from the master sheet have 159 PubChem exact matches https://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/53063500/public/ Of these 127 came in via ChEMBL
https://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/53063546/public/ So this leaves us a shortfall of 130 (although not all have been activity tested and some are intermediates) Note these are not discoverable by similarity search for anyone and we should try to arrange to have a full house at the time we submit the paper.

So what to do @mattodd ? My suggestion is that the team prepares a direct submission of at least the missing structures from the SAR tables to be live at the time we submit the paper, possibly even as a full BioAssay set. These would go live within a week or so and become immediately searchable. As a sort of pilot I have just submitted five of the novel S4 structures on behalf of the team. This at least convinces me how easy this is (see http://cdsouthan.blogspot.com/2017/08/getting-into-pubchem-again.html). NOBA five of the 159 S4 matches now come from yours truly. The question naturally arises as to the advantages of being in both ChEMBL (presumably ChEMBL Malaia also) and PubChem but we don't want the faff of submitting twice. In fact PubChem > ChEMBL could be accomplished by arranging to have the PubChem Bioassay entry marked as "Confirmatory" (which it sort of is anyway). ChEMBL then subsumes these records from PubChem for each release.

Any quirks? of course, how boring if it was that easy....

image

No time to go into this now but I submitted a "flat" (that I checked had no full key match) when a resolved enantiomer was already in PubChem (with a different MMV ID)

cdsouthan commented 7 years ago

See https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/542