OpenSourceMalaria / OSM_To_Do_List

Action Items in the Open Source Malaria Consortium
82 stars 13 forks source link

Adding Cheminformatic Strings for Remaining Series 1 Compounds #347

Closed mattodd closed 8 years ago

mattodd commented 8 years ago

We need to complete the dataset in the Master Sheet for OSM Series 1 compounds before we submit the relevant paper in the near future. The dataset will allow readers of the paper to browse the series interactively using @lpatiny ’s platform.

The compounds in Series 1 highlighted in red in the Master Sheet are those that are mentioned in the paper but which are not yet properly in the Sheet. The first thing is that we need the strings added in for those compounds. There are 56 of them. Could we split this up like this:

@minkyungchong : Any OSM-S compounds up to #200 @jhon4903 : Any OSM-S numbers with # greater than 200 as well as the OSM-E, OSM-A and the OSM-L compounds

Is that OK?

Would you be able add in the SMILES, InChI and InChiKeys for these compounds, plus any MMV codes you happen to find? The relevant compounds can all be found by searching in the Experimental Procedures ELN - use the menu on the right. Lots of the strings are already there too for copying and pasting.

(Series 1 compounds highlighted in the master sheet in green are mentioned in the paper and there are already data in the sheet so it looks like those are OK. Compounds that are not colour-highlighted are not explicitly mentioned in the paper for whatever reason and are a lower priority.)

Once all the informatics data are added, we can add in the biological data (potency and everything else), which should be fairly quick given that all the data are included in the draft paper, and we can work off that. But let’s manage that separately.

Let me know below if there are any problems or if you’ve any questions. Or (of course) if you’ve no current bandwidth and I ought to find someone else! Otherwise, thank you guys.

mattodd commented 8 years ago

Notice guys that if strings are missing for OSM-S-80 through 90 then you might find them here http://malaria.ourexperiment.org/osm_procedures/5402/OSM_Compound_List.html - referring to #318

minkyungchong commented 8 years ago

Ok! It will be done by the end of today.

On 28 September 2015 at 23:04, Mat Todd notifications@github.com wrote:

Notice guys that if strings are missing for OSM-S-80 through 90 then you might find them here http://malaria.ourexperiment.org/osm_procedures/5402/OSM_Compound_List.html

— Reply to this email directly or view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/347#issuecomment-143738146 .

jhon4903 commented 8 years ago

I can't find the OSM-A series on the link you've provided. Would you be able to link me to the page with the A-series of compounds?

minkyungchong commented 8 years ago

I only could find lnChi for OSM-S-80 to 91 from the link you gave me

On Friday, 2 October 2015, jhon4903 notifications@github.com wrote:

I can't find the OSM-A series on the link you've provided. Would you be able to link me to the page with the A-series of compounds?

— Reply to this email directly or view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/347#issuecomment-144909120 .

mattodd commented 8 years ago

Sorry @jhon4903 the structures are here. Are you able to derive the strings from this, or is that not possible?

And @minkyungchong can you derive the other strings from the InChI, or is that not possible/easy? i.e. is there a way to convert the InChI into a structure from which you can get the SMILES and InChIKey? No problem at all if not, just wondering.

drc007 commented 8 years ago

@minkyungchong @mattodd What chemical drawing package are you using, there usually a "Paste Ad" option and you just choose InChi. Alternatively you can use the Chemical identifier resolver to convert http://cactus.nci.nih.gov/chemical/structure

minkyungchong commented 8 years ago

Ok I'll try that!

On Tuesday, 6 October 2015, Chris Swain notifications@github.com wrote:

@minkyungchong https://github.com/minkyungchong @mattodd https://github.com/mattodd What chemical drawing package are you using, there usually a "Paste Ad" option and you just choose InChi. Alternatively you can use the Chemical identifier resolver to convert http://cactus.nci.nih.gov/chemical/structure

— Reply to this email directly or view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/347#issuecomment-145818598 .

jhon4903 commented 8 years ago

I'm not able to open up the .cdxml file as my program only allows for the importing of .cdx files

minkyungchong commented 8 years ago

@mattodd @drc007 Sorry for late submission, but everything has been added! The website was very helpful. Just checking if CCN=C([C@@H]1CC@HN)O is a valid SIMILES?

wvanhoorn commented 8 years ago

CCN=C([C@@H https://github.com/H]1CC@HN)O is not a valid SMILES string

On 10 October 2015 at 00:55, minkyungchong notifications@github.com wrote:

@mattodd https://github.com/mattodd @drc007 https://github.com/drc007 Sorry for late submission, but everything has been added! The website was very helpful. Just checking if CCN=C([C@@H https://github.com/H]1CC@HN)O is a valid SIMILES?

— Reply to this email directly or view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/347#issuecomment-147013432 .

Willem P van Hoorn, PhD Head of Chemoinformatics ex scientia ltd

email: wvanhoorn@exscientia.co.uk web: http://www.exscientia.co.uk phone: +44 1382 346655

mattodd commented 8 years ago

Yes, there are some formatting things in the above versions, but in my text version of these CCN=C([C@@H]1CC@HN)O I get an error in Chemdraw when I try to paste it. Which molecule are you trying to do?

mattodd commented 8 years ago

Thanks @jhon4903 - don't worry about it, it's a little fiddly so I just took care of those 4 compounds. Thank you for doing all the others, that's great.

minkyungchong commented 8 years ago

@mattodd OSM-S-89 was the one

mattodd commented 8 years ago

Did someone fix this? The entry for OSM-S-89 is now CCN=C([C@@H]1CC@HN)O which looks OK to me, even though stereochemistry has been assumed when I paste into Chemdraw.

If everyone is happy I might close this issue and generate new ones for anything remaining (we will have to add in some biological data for all these compounds). Thanks again @minkyungchong and @jhon4903 as well as @drc007 and @wvanhoorn for advice.

wvanhoorn commented 8 years ago

Unfortunately 'CCN=C([C@@H https://github.com/H]1CC@HN)O' is not a valid SMILES string, both MarvinSketch and Pipeline Pilot can't render it into a structure. I don't have access to Chemdraw but I would assume it makes some on the fly fix when the SMILES are not quite right. If Chemdraw has interpreted the SMILES correctly, i.e. the structure is what it is supposed to be, is there an export or 'save as' option to get the SMILES out again? Hopefully these would be the interpreted (corrected) SMILES. Alternatively, if someone could provide a link where the correct structure is shown I can generate the SMILES.

On 13 October 2015 at 11:28, Mat Todd notifications@github.com wrote:

Did someone fix this? The entry for OSM-S-89 is now CCN=C([C@@H https://github.com/H]1CC@HN)O which looks OK to me, even though stereochemistry has been assumed when I paste into Chemdraw.

If everyone is happy I might close this issue and generate new ones for anything remaining (we will have to add in some biological data for all these compounds). Thanks again @minkyungchong https://github.com/minkyungchong and @jhon4903 https://github.com/jhon4903 as well as @drc007 https://github.com/drc007 and @wvanhoorn https://github.com/wvanhoorn for advice.

— Reply to this email directly or view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/347#issuecomment-147675490 .

Willem P van Hoorn, PhD Head of Chemoinformatics ex scientia ltd

email: wvanhoorn@exscientia.co.uk web: http://www.exscientia.co.uk phone: +44 1382 346655

mattodd commented 8 years ago

I think we may be suffering a Github interpretation of what we're pasting here, which is not coming out right. If you go to the sheet and take the SMILEs from there for OSM-S-89 it works fine (for me).

wvanhoorn commented 8 years ago

Aha, that makes a difference: 'CCN=C([C@@H]1CC@HN)O' is a valid string

On 13 October 2015 at 11:45, Mat Todd notifications@github.com wrote:

I think we may be suffering a Github interpretation of what we're pasting here, which is not coming out right. If you go to the sheet http://tinyurl.com/OSM-Compounds and take the SMILEs from there for OSM-S-89 it works fine (for me).

— Reply to this email directly or view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/347#issuecomment-147679360 .

Willem P van Hoorn, PhD Head of Chemoinformatics ex scientia ltd

email: wvanhoorn@exscientia.co.uk web: http://www.exscientia.co.uk phone: +44 1382 346655

lpatiny commented 8 years ago

I added a page on the malaria website:

http://www.cheminfo.org/flavor/malaria/Utilities/SMILES_generator___checker.html http://www.cheminfo.org/flavor/malaria/Utilities/SMILES_generator___checker.html

It allows to generate a SMILES code as well as parse a list of SMILES and generate the structure.

I will add the InCHI generation as well