Closed SalahBioPhysics closed 7 years ago
I'm unfortunately not familiar with how these are currently prepared. Tagging @gregoryross and @steven-albanese. They'll add the loop modeling and minimization steps to Steve's pipeline to provide better structures. I think they're both currently out, but should be back in on Monday to help tackle this.
We'll make sure to prioritize this particular structure.
Hi @SalahBioPhysics, I'm now working on this problem full time for the next couple of days.
The 'fixed' 4LMN protein structure in the example you've linked to is, by eye, completely wrong, and won't be fixed with energy minimization. As I'm sure you're aware, protonation state results for any of the residues within this constructed loop are totally untrustworthy. I think an overhaul of the protein preparation pipeline is required, and I'll be trying a couple methods.
I can narrow down on the method I use if you could tell me a what you do with the 'cleaned' protein structures. For the protein-ligand MCCE simulations, do you simply overlay the ligand over the cleaned protein structure, or do you perform a type of minimization of the ligand position? In the 4AN2 structure, there is a ligand, ATP, and a manganese ion; are all of these included in the MCCE calculations? Thanks
Thanks @gregoryross
Do you simply overlay the ligand over the cleaned protein structure, or do you perform a type of minimization of the ligand position? No we don't do any type of minimization of the ligand position, we simply run MCCE calculation based on the structure you give us.
Are all of these included in the MCCE calculations? We remove all of the non-inhibitor ligands (manganese ion, SO4, ...) from the structure.
Okay, thanks. So just to clarify, do you take the 'fixed' protein PDB structures from mcce-charges/pdbs/
and the ligand PDB structures from mcce-charges/epik_inhibitors/output/
?
Yes. Thanks a lot.
@SalahBioPhysics, I'm having a close look at the Cobimetinib structures, and the cofactors (ATP and ions) are an important part of the protein-inhibitor complex, such that, even with fixed loops, we shouldn't run MCCE on these structures without considering ATP and the ion. Can these be included in the MCCE calculations? If not, do you have another inhibitor-protein structure (that's cofactor free) you've been having trouble with that I can work on?
Since we have about a week to submit the grant, I don't think we'll have time to add more ligands to the calculation. However, we can do it for the paper.
For now, we can only look at protein-inhibitor complex that doesn't have extra ligands. Crizotinib/2YFX_fixed_ph7.4.pdb would do. I will give you a list of PDB soon.
Here is a list of the inhibitors that have/don't have extra ligands. Afatinib Alectinib: has EDO Axitinib Bosutinib Ceritinib: has GOL Cobimetinib: has ACP and MG Crizotinib Dabrafenib Dasatinib Gefitinib Idelalisib Imatinib: has 3YY Lapatinib: has PO4 Lenvatinib: has DTT, EDO, GOL, SO4 Nilotinib Osimertinib Palbociclib: has DMS Pazopanib: has SO4 Ponatinib: has EDO Regorafenib Sorafenib: has ACT and DTT Sunitinib Tofacitinib
Can we get a relaxed structure of the PDBs with the inhibitor bound (_fixed_ph7.4.pdb) and un-bound (_fixed_ph7.4_apo.pdb)?
And this is a dictionary that relate the PDB id to the ligand (if needed).
PDB_Inhibitor_Dictionary = { '4G5J':'Afatinib', '4G5P':'Afatinib', '3AOX':'Alectinib', '4AG8':'Axitinib', '4AGC':'Axitinib', '3UE4':'Bosutinib', '4MKC':'Ceritinib', '4AN2':'Cobimetinib', '4LMN':'Cobimetinib', '2WGJ':'Crizotinib', '2XP2':'Crizotinib', '2YFX':'Crizotinib', '4ANQ':'Crizotinib', '4ANS':'Crizotinib', '5AAA':'Crizotinib', '5AAB':'Crizotinib', '5AAC':'Crizotinib', '4XV2':'Dabrafenib', '5CSW':'Dabrafenib', '5HIE':'Dabrafenib', '2GQG':'Dasatinib', '4XEY':'Dasatinib', '2ITO':'Gefitinib', '2ITY':'Gefitinib', '2ITZ':'Gefitinib', '3UG2':'Gefitinib', '4I22':'Gefitinib', '4WKQ':'Gefitinib', '4XE0':'Idelalisib', '2HYY':'Imatinib', '3PYY':'Imatinib', '1XKK':'Lapatinib', '3WZD':'Lenvatinib', '3CS9':'Nilotinib', '4ZAU':'Osimertinib', '2EUF':'Palbociclib', '5L2I':'Palbociclib', '3CJG':'Pazopanib', '3IK3':'Ponatinib', '3OXZ':'Ponatinib', '3ZOS':'Ponatinib', '4C8B':'Ponatinib', '4QRC':'Ponatinib', '4TYJ':'Ponatinib', '4U0I':'Ponatinib', '4UXQ':'Ponatinib', '4V01':'Ponatinib', '4V04':'Ponatinib', '2QU5':'Regorafenib', '3WZE':'Sorafenib', '4ASD':'Sorafenib', '4AGD':'Sunitinib', '3LXK':'Tofacitinib' }
Great, thanks a lot. That's very helpful. I'll get the relaxed structures to you as soon as I can.
I'm still a little confused as to where you get the inhibitor structures from for the MCCE calculations. Although the inhibitor structures in mcce-charges/epik_inhibitors/output/
are fully protonated (which is what I need) they do not overlap with the binding modes of the ligands in mcce-charges/pdbs/
. Do you rotate and translate the structures in mcce-charges/epik_inhibitors/output/
to put them in the binding sites before each calculation, or is it something that I'm missing? Thanks!
Just a quick update: it looks like the inhibitor structures are taken from one of the PDB files, but that location doesn't align with the others.
I will look into the first question regarding the inhibitors PDB.
I'm not sure what you mean by "location doesn't align"
@gregoryross, @SalahBioPhysics
If I understand correctly, the ligand coordinates in the MCCE calculations are taken from mcce-charges/epik_inhibitors/output/
, not from the protein PDB file that contained the ligand.
I took a quick look at the script that generates those ligand files. It takes them from the Chem_ID
column in this file
These identifiers are PDB identifiers for the ligand-expo. Presumably, these are taken arbitrarily from some experimental data source, but those coordinates are not going to be correct for putting a ligand back in an arbitrary APO pdb file. They should be incompatible.
Thanks @bas-rustenburg, that's what looks like to me. Does that seem reasonable @SalahBioPhysics?
If that is indeed the case, I'm happy to completely overhaul the preparation pipeline to ensure that 1) the loops are reasonable, and 2) each inhibitor is in the correct location and has the right binding mode for the corresponding protein structure. Unfortunately, that will take more time than the original plan.
Yes @gregoryross. That would be great, thanks a lot.
No problem @SalahBioPhysics.
Hi @SalahBioPhysics, could you tell me exactly what files you need to run a protein-ligand MCCE simulation? For instance, if you were to run an ABL-imatinib simulation using the 2HYY PDB structure, which files from the mcce-charges
repo would you use (assuming the structures are correct)? Thanks!
Sorry for the late respond.
I believe you don't need to re-run Epik for 4, 5, and 6.
Great, thanks.
And just to double check: the Imatinib-input.pdb
, Imatinib-epik.mol2
, and
Imatinib-epik-charged.mol2
structures should be in the correct binding mode with respect to 2HYY_fixed_ph7.4.pdb
, right?
Also, are you aware that the XXXX_fixed_ph7.4.pdb
files contain water molecules? I'm guessing all water molecules should be removed? The XXXX_fixed_ph7.4.pdb
files also contain a the depronated inhibitor inside the binding site as well, is that required? I just want to make sure the files we supply contain exactly what you need for the MCCE calculations.
Imatinib-input.pdb, Imatinib-epik.mol2, and Imatinib-epik-charged.mol2 structures should be in the correct binding mode with respect to 2HYY_fixed_ph7.4.pdb, right? I honestly don't know.
are you aware that the XXXX_fixed_ph7.4.pdb files contain water molecules? Yes, we will remove them
The XXXX_fixed_ph7.4.pdb files also contain a the depronated inhibitor inside the binding site as well, is that required? Yes, however; MCCE looks into the topology file that we built using the Epik output, so it will consider all protomers/tautomers states.
@gregoryross if you have any structures ready, please feel free to push them. I have data based on fast quick calculation. Would be nice to test against the new structures. However, no pressure :)
@steven-albanese : Didn't you just push a batch of structures?
Just to clarify. We do not need additional charge files for the inhibitors. The problem is local. The inhibitors have van der Waals clashes with specific amino acids. If you want Salah can tell specifically which ones are clashing. This REALY destabilizes binding. Right now we have turned off the vdw to measure the binding energy. With the runs we are doing with limited conformer sampling this is fine. So we could love to have energy minimized files to see this that helps. (even 1 energy minimized protein would let us know if our force fields are similar enough that this will help. m
On Dec 1, 2016, at 8:00 PM, Salah Salah notifications@github.com wrote:
@gregoryross https://github.com/gregoryross if you have any structures ready, please feel free to push them. I have data based on fast quick calculation. Would be nice to test against the new structures. However, no pressure :)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/choderalab/mcce-charges/issues/48#issuecomment-264343558, or mute the thread https://github.com/notifications/unsubscribe-auth/AJtPXg-tJBGFh_OJvU7hAtHgd8Re0XKNks5rD226gaJpZM4K645n.
We're on it! @steven-albanese has engineered a new pipeline to build looks and relax the structures with the ligand present using Schrodinger tools. I think that's nearly ready.
Hi @SalahBioPhysics and @mrgunner, @steven-albanese has produced new structures using Schrodinger's pipeline. They can be found in this repo:
These structures (found in the pdbs
folder and called 'XXXX-fixed.pdb`) are far cleaner than what we previously had. However, Steven and I have had a close look at a many of these and, in general, still need more work to clean them up, which is what I'll be doing today. Some of the problems are
I'm currently working on generating a list of the structures we can use, and refining the ones we can use. If you really want to work on a structure now, please do take one from the repository, but please check it by eye, and have a look at the log files that can be found in the XXXX-fixed/
directories.
@mrgunner, as you can see from the above discussion, we're unsure what structure you need for the inhibitors. Do your MCCE calculations require a PDB structure of the inhibitor to be fully protonated (or have multiple charge states) and be located in the protein binding site? I ask because it seems as though---and I certainly may be wrong here---that the inhibitor structures you used in the previous calculations are in random locations.
@SalahBioPhysics and @mrgunner the structures from the latest rounds of refinement are located here. As documented in this notebook, the structures have been selected with various criteria, and minimized with openmm. Many structures failed the minimization process for a variety of reasons, and 31 have passed. Some of the loops may have had such bad clashes that the simulations still blew up during the minimization. There may still be problems with these, and they all should be manually inspected. However, given the time-pressure, the minimized structures may be considered clean enough for now.
The structures that passed the current tests are for the following complexes:
Alectinib-ALK
Axitinib-VEGFR1
Bosutinib-BCR-ABL
Ceritinib-ALK
Crizotinib-ALK
Crizotinib-MET
Dasatinib-BCR-ABL
Dasatinib-BCR-ABL
Erlotinib-EGFR
Gefitinib-EGFR
Imatinib-BCR-ABL
Nilotinib-BCR-ABL
Palbociclib-CDK6
Ponatinib-DDR1
Regorafenib-VEGFR1
Sorafenib-VEGFR1
Sunitinib-VEGFR1
Tofacitinib-JAK3
The minimized structures can be found in the `/minimized/' directories. They still contain explicit water molecules.
Great, thanks @gregoryross @steven-albanese and @bas-rustenburg. I'll run mcce and update you.
@SalahBioPhysics, it was @steven-albanese and I!
yes, I meant to tag you. Thanks to both :)
Hi @SalahBioPhysics, I've finished my final round of refinement of the kinase structures. Many of the structures that failed the initial round of refinement with openmm required manually fixing and have run successfully this time. I've also minimized all structures in explicit solvent, as opposed to vacuum in the original set. The latest structures are located in https://github.com/choderalab/PDBfinder inside the pdbs/*/explicit_water_minimization
directories.
The structures 4G5P, 4WKQ, 4XE0, and 4ZAU had missing loops longer than 20 residues and couldn't be modeled in. As before, the cobimetinib-MEK structures (4AN2 and 4LMN) were omitted due to the presence of ATP and a metal ion in the binding site. Finally, 3CJG was also not minimized due the the presence of non-standard residues (possibly post-transnational modifications) that were difficult to model with openmm. More details can be found in the refinement-tools
directory of https://github.com/choderalab/PDBfinder.
All the best,
Greg
@gregoryross awesome, thanks a lot. It will be a while before I start working on this again, but i'll keep you posted :)
Great, thanks.
I'll now close this issue.
Quick question: Does this batch include structures of apo kinases (without ligands) as well? Or just the kinase:inhibitor complexes?
The structures in pdbs/*/explicit_water_minimization/
are kinase-inhibitors complexes in explicit water, with all non-protein atoms listed as HETATMS
. The data set contains no true apo structures, so pseudo apo structures have to be made by removing the ligands.
OK! One other aspect to look at in the future is whether the pseudo-apo structures behave in the same way as true apo structures (if they exist). I'll create a separate issue!
Sounds good!
@bas-rustenburg try with this protein first Cobimetinib-4AN2. Thanks