OpenSourceMalaria / OSM_To_Do_List

Action Items in the Open Source Malaria Consortium
82 stars 13 forks source link

Preparing for OSM Meeting 8 - OSM numbering clash: OSM-S-220 #216

Open mattodd opened 10 years ago

mattodd commented 10 years ago

Two compounds have been assigned the code OSM-S-220.

http://malaria.ourexperiment.org/osm_procedures/9966/Preparation_of_OSMS220.html http://malaria.ourexperiment.org/osm_procedures/10006/Preparation_of_OSMS220.html

Can we resolve? Suggest your hypervalent iodine reagent, @alintheopen, is shunted forwards, since the chlorinated S/M is labelled as 220 in the GSK assay documents.

alintheopen commented 10 years ago

Gah, will sort this tomorrow!

On Mon, Jun 9, 2014 at 10:52 PM, Mat Todd notifications@github.com wrote:

Two compounds have been assigned the code OSM-S-220.

http://malaria.ourexperiment.org/osm_procedures/9966/Preparation_of_OSMS220.html

http://malaria.ourexperiment.org/osm_procedures/10006/Preparation_of_OSMS220.html

Can we resolve? Suggest your hypervalent iodine reagent, @alintheopen https://github.com/alintheopen, is shunted forwards, since the chlorinated S/M is labelled as 220 in the GSK assay documents.

— Reply to this email directly or view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/216.

egonw commented 10 years ago

Please ask @cdsouthan's comment too, but considering that other databases may already have started using the OSM identifier, I suggest to pick new identifiers for both compounds, and to blacklist OSM-S-220. Because it has been used for two compounds, you never know what structures another database has copied to be linked to this ID...

cdsouthan commented 10 years ago

No need for self-chastisement @alintheopen. AWAK a lot of this happens around the cheminformatics block on the QT - but you are open about causes and fixes :) . I have no particular technical experience here but egon's idea of splitting out new ids, and deprecating the ambigous entry seems a good idea. However, that triflate would get kicked out of most database submission filters anyway (I think) so if you just renamed that in the open lab book you might be OK. Going forward you should keep an eye on OSM-S-220 in the wild - at least in the key places such as ChEMBL and PubChem (you dont have a direct ChemSpider feed I guess?) and just kill off the the wrong (name to struc) if it gets "out" If you Google the code and the InChI keys most point back to you but some re-cycling is going on via http://onsnetwork.org/about-onsnetwork-org/ (real colleagues or opportunists?). I feel bound to add I would have killed off the double hyphens entirely but I guess its too late ... :(

mattodd commented 10 years ago

This is a most interesting feature of the project being "live" - that bots index the work almost immediately. I guess I'd ask whether, over time, the remnants of the incorrect numbering would fade, or be superseded by the new numbering? i.e. if we remove one incorrect entry and maintain the other, will not the correct number gain much more prominence than the other through being actively cross-referenced? For any residual references to the "wrong" structure we can try manual deletions and corrections?

On 10 June 2014 01:00, cdsouthan notifications@github.com wrote:

No need for self-chastisement @alintheopen https://github.com/alintheopen. AWAK a lot of this happens around the cheminformatics block on the QT - but you are open about causes and fixes :) . I have no particular technical experience here but egon's idea of splitting out new ids, and deprecating the ambigous entry seems a good idea. Going forward you should try to "kill off" OSM-S-220 in the wild - at least in the key places such as ChEMBL and PubChem (you dont have a direct ChemSpider feed I guess?) just e-mail them to pull it. If you Google the code and the InChI keys most point back to you but some re-cycling is going on via http://onsnetwork.org/about-onsnetwork-org/ (real colleagues or opportinists?). I feel bound to add I would have killed off the double hyphens entirely but I guess its too late ... :(

— Reply to this email directly or view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/216#issuecomment-45500167 .

MATTHEW TODD | Associate Professor School of Chemistry | Faculty of Science

THE UNIVERSITY OF SYDNEY Rm 519, F11 | The University of Sydney | NSW | 2006 T +61 2 9351 2180 | F +61 2 9351 3329 | M +61 415 274104 E matthew.todd@sydney.edu.au | W http://sydney.edu.au/science/chemistry/research/todd.html | W http://opensourcemalaria.org/

CRICOS 00026A This email plus any attachments to it are confidential. Any unauthorised use is strictly prohibited. If you receive this email in error, please delete it and any attachments.

cdsouthan commented 10 years ago

I think this is right from the principles of the Google cache being refreshed and, as you say, retrieval ranking depending largely on link traffic. It would be an interesting and important exersize if, for this case, you actually tracked what did happen by archiving the name and InChIKey Google results at regular (monthly?) intervals, keep time stamps of what was changed at your end, and track when and what and eventual surfaces in the major databases.

mattodd commented 10 years ago

27/6/14 search on name OSM-S-220

screen shot 2014-06-27 at 10 07 45 am Search on InChI of structure to be killed: screen shot 2014-06-27 at 10 10 40 am

Others should feel free to track this over time.