OpenSourceMalaria / OSM_To_Do_List

Action Items in the Open Source Malaria Consortium
82 stars 13 forks source link

A Pharmacophore Competition - thoughts/input gratefully received. #412

Closed alintheopen closed 8 years ago

alintheopen commented 8 years ago

OSM want to launch a competition to build a pharmacophore model for series four. It will be based on existing open data, and that the models will be tested on a test dataset that will be available later in the year. Key links can be found in the project wiki (please feel free to edit as well as use)

I've opened this issue so that we can iron out how this might work. I've sketched out a rough proposal below, but would really welcome any input on how to improve the competition and some answers to specific questions. Once we've developed the competition guidelines as a community, I will close this issue and repost the edited guidelines when we launch the competition for real.

Outline We need to build a predictive pharmacophore model for PfATP4. PfATP4 is a sodium pump found in the membrane of the malaria parasite. A number of promising antimalarial compounds, with distinct and diverse chemical structures, have been found active in an ion regulation assay, which was developed by Kiaran Kirk's lab at ANU. A number of publications have indicated that this ion channel looks to be an important new target for malaria medicines. It seems that PfATP4 active compounds disrupt the ion channel and cause the rapid influx of Na+ into the parasite, leading to it's demise. The structure of PfATP4 is not known. Simulations, based on docking of PfATP4 actives, have used a homology model developed by Joseph DeRisi's laboratory. OSM want to build a predictive pharmacophore model to assist in design and synthesis of new Series Four compounds and of course to help others working on other compound series.

The first attempt @murrayfold had a quick, informal go (here and here) at the development of a pharmacophore model using known actives and inactives from the malaria box. At the time Kiaran Kirk's paper was under embargo but Murray has since written up his work. This initial attempt was unsuccessful (i.e. not predictive - see image below, where the "P Model predictions" correlate poorly with what was found in the ion regulation assay) possibly because the model did not allow for overlapping binding sites or take into consideration compound chirality.

compounds sent to kirk and fidock

The Competition We need a predictive in silico model, the best* model will win the prize**

How will it work? OSM will provide:

Submission Rules

How will entries be assessed?

What's the prize **TBC Along with the opportunity to contribute to our understanding of a new class of antimalarials and authorship on a peer-reviewed publication.

What if none of the models are any good? Good question. If all models fail to meet said statistical test then it may not be possible to announce a 'winner'. All data will be collated and published at least in the form of a blog, if not a paper...and then we will try again.

Deadline TBC but potentially end of August 2016

***A 'valid' entry is one that stands up to the rigour expected from published in silico models. Judges are entitled to use digression in the case of unconventional entrants, for example those from people with no formal training such as high school students.

MFernflower commented 8 years ago

No idea if this is worth adding to the above post but the structure of bovine SERCA atpase is available on PDB: http://www.rcsb.org/pdb/explore.do?structureId=3tlm

I would try and dock some s4 compounds to it - but I do not understand autodock at all

@alintheopen

murrayfold commented 8 years ago

Is the homology model available? I've not read the paper in full, but in theory it could be recreated (but would be easier if we could get access to it)

I'd like another go at this but my spare time is rather limited with trying to sell three houses, buy another one, move and arrange a wedding all in the next three months! I can think of a couple of people here that might have a go at it when it's launched.

MedChemProf commented 8 years ago

Excellent suggestion. Access to the Homology Model that was already published would make a significant difference. A description of how they made the model was in the paper, but the model itself was not included in the supplemental information.

murrayfold commented 8 years ago

The author list is full of familiar names, I think it just requires the right person asking the question.

MFernflower commented 8 years ago

Maybe if Matt has free time he could request the homology model be released into public domain?

@murrayfold @mattodd

drc007 commented 8 years ago

Do you want to be able to predict activity in the current series or ALL structural classes? I suspect any pharmacophore model will only be suitable for molecules that have a similar binding mode. Any pharmacophore will only really be suitable for the applicability domain it was built for if your test set are unrelated molecules they may score poorly.

alintheopen commented 8 years ago

@murrayfold great idea on homology model, what do you think @mattodd? @MedChemProf that was my understanding also from the SI.

Hi @drc007, do you think we are better to launch a competition for just series four then? This would mean that after resynthesis and submission of some OSM compounds we would have a set of only ~20 S4 compounds to design the model. @murrayfold's original work used the malaria box compounds (so diverse chemotypes) to design the initial model.

mattodd commented 8 years ago

Yes, I will ask for details of the model as part of the initial release of data. I think the originators will be amenable. The applicability of the model is an interesting question: 1) We'd like to make sense of the current series - to be more predictive about which compounds are active and which inactive. So we'd like a model for the current series, yes. 2) There's also the larger question of how so many different compounds can apparently be hitting the same target. A possibility is that the different chemotypes have different binding sites on the large ATP4 protein. Whether it's possible to show this with the homology model would be of interest. My question is whether we should, in the competition, separate these things out. Perhaps so. The first question is smaller and more immediately useful to us. The second question is larger and more strategically interesting.

JeremyHorst commented 8 years ago

Hello friends, I just got the request for the structure model from Matthew. I will review the details of this thread later, and wanted to share the file as soon possible. Please appreciate that this is a comparative model, and as with all models some of the details are wrong.

Peace and Thanks, Jeremy Horst

PfATP4-PNAS2014.pdb.txt

mattodd commented 8 years ago

That's great, thanks @JeremyHorst . This takes us a step closer to starting up the competition.

Re the models incorporating resistance mutations, there are a few that are known. To generate those models, is it simply a case of providing you with the list of which amino acids have changed and how?

MFernflower commented 8 years ago

I cannot seem to open the pdb file in jmol?

JeremyHorst commented 8 years ago

You may have to remove the ".txt" at the end of the file name. The malaria website would not let me post a ".pdb" file, so i used a standard end around.

Peace and thanks, Jeremy

On Jun 29, 2016, at 6:06 PM, MFernflower notifications@github.com wrote:

I cannot seem to open the pdb file in jmol?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

murrayfold commented 8 years ago

Many thanks @JeremyHorst , opens for me without a problem

image

MFernflower commented 8 years ago

After a bit of tinkering I managed to get the file open

holeung commented 8 years ago

Hi. I've been following OSM for several months and have been waiting for an opportunity to contribute. I would like to help out here. I am an experimental protein crystallographer, computational structural biologist, and computational chemist. I've refined the PNAS homology model and have others to consider. I plan to do some docking and pharmacophore modeling. Should I open a new lab book to share my results?

MedChemProf commented 8 years ago

@holeung I am sure others will comment soon, but your offer to help out on this is very welcome and appreciated. Personally, I only have a few visualization tools and do not have the needed docking software. I have been using a tool on the Mcule website (https://mcule.com) to dock a few molecules in the interim. I do not know if it will be possible, but I would like to hear your thoughts on how best to share any developed pharmacophore model so that others might be able to test hypotheses without necessarily having access to docking software. Thank you again for your offer to contribute.

mattodd commented 8 years ago

Hi @holeung - sounds great. The only requirement here is to share your work as you go, meaning a public lab notebook. That can be one of your own, or one using platforms that we currently use, such as Labtrove or Labarchives. Provided all data and ideas are shared, you can do what you please! Happy to advise further. Would you agree, though, that there are potentially two strands here? One based on docking structures to the homology model and one based on the development of a pharmacophore model that is centered on the present OSM Series 4?

cdsouthan commented 8 years ago

Good stuff, but being more of pharmacophore tyro I'm happy to leave this to others (and don't have appropriate software suites anyway). On a less positive note I should openly record my disappointment that none of the companies I pinged on twitter (with this link) responded. AWK, the likes of OpenEye, Cresset, MolSoft, ChemAxon, Schrodinger and Biovia (PP) and others, could have had a go at this like rolling off a log. One can imagine reasons for reluctance (e.g. fear of open comparisons) but if anyone has internal contacts with such folk it still might be worth a try.

holeung commented 8 years ago

Yes, one can do ligand based pharmacophore discovery and docking separately. But successful location of the binding site can greatly aid pharmacophore definition. My preliminary work suggests that the ligand binding sites are not obvious computationally, complicated by the lack of a high confidence homology model.

holeung commented 8 years ago

I've started a new lab notebook on Labtrove with my homology models. I'll work on docking over the next few days.

JeremyHorst commented 8 years ago

Hi Mat, sure, I'd be happy to generate mutated structure models - send the list! This is very simple.

Perhaps HoLeung or others would refine the models?

I recommend Chimera <www.cgl.ucsf.edu/chimera> for edits, but there are many good tools. (I was using Chimera before coming to UCSF, so feel this is not a biased recommendation :->)

On Wed, Jun 29, 2016 at 5:29 PM, Mat Todd notifications@github.com wrote:

That's great, thanks @JeremyHorst https://github.com/JeremyHorst . This takes us a step closer to starting up the competition.

Re the models incorporating resistance mutations, there are a few that are known. To generate those models, is it simply a case of providing you with the list of which amino acids have changed and how?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/412#issuecomment-229528420, or mute the thread https://github.com/notifications/unsubscribe/ATRaMd4WyiIOGsHrQcZHMhUTKTYs7yLUks5qQw30gaJpZM4I8YG_ .

JeremyHorst commented 8 years ago

Would you please share with the group what you needed to do to open in JMol? Others might experience the same issue.

On Thu, Jun 30, 2016 at 11:50 AM, MFernflower notifications@github.com wrote:

After a bit of tinkering I managed to get the file ope

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/412#issuecomment-229753437, or mute the thread https://github.com/notifications/unsubscribe/ATRaMXSOlSrrJi8n-30dSJhC9DJKC3pBks5qRA_dgaJpZM4I8YG_ .

MFernflower commented 8 years ago

@JeremyHorst I spoke too soon- I still cannot get the file open in JMOL (attached is the error message it gives) - I was able to get the file open in ProteinWorkshop (http://www.rcsb.org/pdb/staticHelp.do?p=help/viewers/proteinWorkshop_viewer.html) but most other programs crashed mutch the same way jmol did:

jmolcrash

JeremyHorst commented 8 years ago

Thank you very much. It seems these tools don't recognize the CONECT lines

I believe RasMol , Chimera <www.cgl.ucsf.edu/chimera> and PyMol

are the most popular protein structure viewers. On Tue, Jul 5, 2016 at 10:14 AM, MFernflower notifications@github.com wrote: > @JeremyHorst https://github.com/JeremyHorst I spoke too soon- I still > cannot get the file open in JMOL (attached is the error message it gives) - > I was able to get the file open in ProteinWorkshop ( > http://www.rcsb.org/pdb/staticHelp.do?p=help/viewers/proteinWorkshop_viewer.html) > but most other programs crashed mutch the same way jmol did: > > [image: jmolcrash] > https://cloud.githubusercontent.com/assets/3164942/16593236/6a932a00-42b2-11e6-8443-d83f2956dd9e.png > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/412#issuecomment-230541844, > or mute the thread > https://github.com/notifications/unsubscribe/ATRaMc6IBFnOOr5q7e8gfS17sQx2pc6wks5qSpEIgaJpZM4I8YG_ > .
MFernflower commented 8 years ago

@JeremyHorst What does TER 13623 THR A1024 do and is it needed?

Also I grabbed chimera as per your recommendation

JeremyHorst commented 8 years ago

Some protein viewers use this line to denote the end of the protein chain, before the next one starts. This is a monomer, so it is not necessary.

On Tue, Jul 5, 2016 at 11:23 AM, MFernflower notifications@github.com wrote:

@JeremyHorst https://github.com/JeremyHorst What does TER 13623 THR A1024 do and is it needed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/412#issuecomment-230560604, or mute the thread https://github.com/notifications/unsubscribe/ATRaMae3AzHnWLx9CRFCFIOc52uuiil6ks5qSqEfgaJpZM4I8YG_ .

MFernflower commented 8 years ago

@JeremyHorst I loaded the original PDB file in chimera and then told it to write out a brand new pdb file - this new file works in jmol and seems to use the standard official PDB coding:

PfATP4DrugBoundJMol.pdb.txt

JeremyHorst commented 8 years ago

Interesting. Looks the same, just with the sheet and helix entries. Maybe this website screws things up.

On Tue, Jul 5, 2016 at 11:34 AM, MFernflower notifications@github.com wrote:

@JeremyHorst https://github.com/JeremyHorst I opend the orginal PDB file in chimera and then told it to write out a brand new pdb file - this new file works in jmol and seems to use the standard offical PDB coding:

PfATP4DrugBoundJMol.pdb.txt https://github.com/OpenSourceMalaria/OSM_To_Do_List/files/348380/PfATP4DrugBoundJMol.pdb.txt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/412#issuecomment-230563622, or mute the thread https://github.com/notifications/unsubscribe/ATRaMf6wVfTHuT8rjm7reUHEUgx5F2Onks5qSqOygaJpZM4I8YG_ .

holeung commented 8 years ago

I recommend PyMol for viewing. It is easier to learn than Chimera although Chimera has more functionality.

http://www.pymolwiki.org/index.php/Windows_Install http://www.pymolwiki.org/index.php/MAC_Install

I am happy to help create models of mutant structures.

MFernflower commented 8 years ago

I have re uploaded my version that was fixed in Chimera to google drive for ease of access: https://drive.google.com/open?id=0B5aEWUyp9jxLRXVQeFBDSmZJWkE

@holeung @JeremyHorst

MedChemProf commented 8 years ago

@cdsouthan In reference to your request to the software vendors, I did ping one vendor that I have some contacts. They entertained the idea, but were not set up to produce a pharmacophore model at the moment. I will follow up with them again soon.

JeremyHorst commented 8 years ago

There are plenty of freely available software packages, including open source ones. There are even webservers to dock with a click.

On Wed, Jul 6, 2016 at 9:26 AM, Chase Smith notifications@github.com wrote:

@cdsouthan https://github.com/cdsouthan In reference to your request to the software vendors, I did ping one vendor that I have some contacts. They entertained the idea, but were not set up to produce a pharmacophore model at the moment. I will follow up with them again soon.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/412#issuecomment-230827125, or mute the thread https://github.com/notifications/unsubscribe/ATRaMYUKZUpbYFIduJudzNNNdxQhfPBUks5qS9dOgaJpZM4I8YG_ .

holeung commented 8 years ago

I recommend SwissDock (http://swissdock.ch) as a free, easy to use, docking server.

ryancoleman commented 8 years ago

Not sure why this is limited to pharmacophore in the title of the competition. Would certainly make sense to change the title to open this to any form of modeling.

mattodd commented 8 years ago

Thanks all. Well, we started from the point of view that we want to be predictive about compounds in this series (our current series) - which compounds will be most active? We had originally proposed that we should achieve this by developing a pharmacophore model (i.e. without consideration of a homology model of the target) because we have a bunch of actives and inactives. I still think we should do this.

But we also have the homology model. We can include this in the competition, or we can keep it separate. Perhaps all we do is say the following: "Later this year will be released a dataset of an unpublished compound set that will contain potency vs parasite and ability to block PfATP4. The model that performs best in predicting these new data will win."

We have the additional feature here that there are data on a bunch of other structures (outside our series of interest) that also apparently hit this target. Explaining this is interesting scientifically, but perhaps this should not be part of the competition, at least in the initial phase? The complication is that these other compounds might be binding elsewhere. I find this fascinating, but perhaps we should be more pragmatic here and focus only on Series 4 for the moment.

So the competition is for the best model for predicting Series 4 activity, using all the available data?

JeremyHorst commented 8 years ago

I completely support your approach: given the input data, use any computational means you can to predict the outcome. Participants will presumably use the available data to inform their model, which includes compound affinities / effective / inhibitory doses, and features of the protein such as predicted structure.

As you may well know, in other successful competitions such as CASP, participants submit an abstract at the end of the competition, describing their methodology. Sometimes a participant / team that does not perform well overall picks up something that others do not, with a unique methodology which can inform the biology.

Peace and Thanks, Jeremy

On Sat, Jul 9, 2016 at 6:50 PM, Mat Todd notifications@github.com wrote:

Thanks all. Well, we started from the point of view that we want to be predictive about compounds in this series (our current series) - which compounds will be most active? We had originally proposed that we should achieve this by developing a pharmacophore model (i.e. without consideration of a homology model of the target) because we have a bunch of actives and inactives. I still think we should do this.

But we also have the homology model. We can include this in the competition, or we can keep it separate. Perhaps all we do is say the following: "Later this year will be released a dataset of an unpublished compound set that will contain potency vs parasite and ability to block PfATP4. The model that performs best in predicting these new data will win."

We have the additional feature here that there are data on a bunch of other structures (outside our series of interest) that also apparently hit this target. Explaining this is interesting scientifically, but perhaps this should not be part of the competition, at least in the initial phase? The complication is that these other compounds might be binding elsewhere. I find this fascinating, but perhaps we should be more pragmatic here and focus only on Series 4 for the moment.

So the competition is for the best model for predicting Series 4 activity, using all the available data?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/412#issuecomment-231565995, or mute the thread https://github.com/notifications/unsubscribe/ATRaMZ42cB5iLglj7MeN6Kn1zrG0wFU6ks5qUE_1gaJpZM4I8YG_ .

drc007 commented 8 years ago

"So the competition is for the best model for predicting Series 4 activity, using all the available data?" Absolutely. I'd suggest it does not matter what type of model is used, be it 2D descriptors, pharmacophore, 3DQSAR, field points or docking we are simply interested in the predictive capability. Indeed it might be interesting to look at the relative capabilities of the different strategies.

holeung commented 8 years ago

I like this approach. I think a special value of open source science is the freedom to pursue ideas and goals in a less directed manner.

Ho Leung Ng University of Hawaii at Manoa Assistant Professor, Department of Chemistry hng@hawaii.edu

On Sat, Jul 9, 2016 at 9:07 PM, Chris Swain notifications@github.com wrote:

"So the competition is for the best model for predicting Series 4 activity, using all the available data?" Absolutely. I'd suggest it does not matter what type of model is used, be it 2D descriptors, pharmacophore, 3DQSAR, field points or docking we are simply interested in the predictive capability. Indeed it might be interesting to look at the relative capabilities of the different strategies.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/412#issuecomment-231574470, or mute the thread https://github.com/notifications/unsubscribe/AKcbkrPBylsToWs0IuGTbAlp_rxFS2EEks5qUJoagaJpZM4I8YG_ .

mattodd commented 8 years ago

Last point: the model we need will be the one that best predicts activities of molecules in our current series. The data we'll collate for everyone at the start of this competition will include all the data we have on the PfATP4 activity of compounds in the series. We will also collate all the available public domain data for the activity of other compounds against PfATP4. These will presumably be of slightly less interest (given the possibility that there are multiple binding sites) but I'm not going to presume which data are interesting and which not. However, the test set of data coming later in the year (not yet in the public domain): it will be important for this dataset to include compounds from Series 4. We must make sure this is the case, and it may mean we (as a project) hold back the release of data on some Series 4 compounds (there's a first time for everything) in order to provide the best test set. If we did that, how many Series 4 compounds ought there to be in the test set, if we make the assumption that 50% of the compounds are active? 10? 20?

JeremyHorst commented 8 years ago

The more the better. 20 is definitely sufficient.

Negative data are nearly as important as positive - it is important to know the non interacting compounds in the previous screening sets from which the interactors were found.

Peace and thanks, Jeremy

On Jul 11, 2016, at 6:34 AM, Mat Todd notifications@github.com wrote:

Last point: the model we need will be the one that best predicts activities of molecules in our current series. The data we'll collate for everyone at the start of this competition will include all the data we have on the PfATP4 activity of compounds in the series. We will also collate all the available public domain data for the activity of other compounds against PfATP4. These will presumably be of slightly less interest (given the possibility that there are multiple binding sites) but I'm not going to presume which data are interesting and which not. However, the test set of data coming later in the year (not yet in the public domain): it will be important for this dataset to include compounds from Series 4. We must make sure this is the case, and it may mean we (as a project) hold back the release of data on some Series 4 compounds (there's a first time for everything) in order to provide the best test set. If we did that, how many Series 4 compounds ought there to be in the test set, if we make the assumption that 50% of the compounds are active? 10? 20?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

paul-hawkins commented 8 years ago

@cdsouthan Some observations from the OpenEye perspective: We do not provide pharmacophore or QSAR tools that predict activity of compounds quantitatively, so we can't actually participate in the competition as originally conceived. We are involved in other open data 'competitions' like D3R. Unfortunately our time is not elastic, so time spent on this project would require us to spend less time on these other projects.

MFernflower commented 8 years ago

Just had a glance over at @holeung 's docking studies - It's interesing how tightly the cyanophenyl group binds to serine 175 in the pump, The triazole part of the fused ring system appears to grab onto a buried leucine - Perhaps this triggers a conformation change of the pump causing it to loose activity? (similar to how palytoxin works)

mattodd commented 8 years ago

Hi @JeremyHorst - I've a colleague who'd like to compare the amino acid sequence of PfATP4 to other proteins in BLAST and he would like the sequence in FASTA format. Is that buried in the pdb? Or is this available from somewhere else? Apologies for what must be a simple query.

MFernflower commented 8 years ago

FASTA: http://pastebin.com/qF4vKesf

If you ever need to get the FASTA out of a pdb just use openbabel.org

@mattodd

mattodd commented 8 years ago

Wonderful, thanks Mandrake.

holeung commented 8 years ago

Crystal structures (PDB files) often don't contain complete sequences. Also, crystallographers often have to modify the genetic sequence to express, purify, and crystallize their proteins. Full sequence for PfATP4 can be found at

http://www.uniprot.org/uniprot/Q9U445#sequences

mattodd commented 8 years ago

Thanks, Ho - yes, I remember people telling me that flexible ends of membrane proteins are often removed for crystallisation attempts (on the assumption that we're interested in the transmembrane domains), though I've not checked to see if there's any difference here. I had assumed the PDB contained the full sequence because it's a model.

MFernflower commented 8 years ago

Just checked, The full gene product is about 1306 AA long while the homology model is around 900 AA long

@mattodd @holeung

JeremyHorst commented 8 years ago

Yes, use the UniProt sequence. I was unable to model a long portion of the extracellular region. We're confident about the domain recognition for the parts that are modeled.

Peace and thanks, Jeremy

On Aug 15, 2016, at 9:34 PM, MFernflower notifications@github.com wrote:

Just checked, The full gene product is about 1306 AA long while the homology model is around 900 AA long

@mattodd

On Mon, Aug 15, 2016 at 9:22 PM, Mat Todd notifications@github.com wrote:

Thanks, Ho - yes, I remember people telling me that flexible ends of membrane proteins are often removed for crystallisation attempts (on the assumption that we're interested in the transmembrane domains), though I've not checked to see if there's any difference here. I had assumed the PDB contained the full sequence because it's a model.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/412#issuecomment-239976827, or mute the thread https://github.com/notifications/unsubscribe-auth/ADBLDusWxlWOAdqD0X9hPRBSBN3Ray3xks5qgRDOgaJpZM4I8YG_ .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

alintheopen commented 8 years ago

Closing as competition launched here: #421