Proteins to relax - Githubissues

SalahBioPhysics commented 7 years ago

@bas-rustenburg try with this protein first Cobimetinib-4AN2. Thanks

bas-rustenburg commented 7 years ago

I'm unfortunately not familiar with how these are currently prepared. Tagging @gregoryross and @steven-albanese. They'll add the loop modeling and minimization steps to Steve's pipeline to provide better structures. I think they're both currently out, but should be back in on Monday to help tackle this.

We'll make sure to prioritize this particular structure.

gregoryross commented 7 years ago

Hi @SalahBioPhysics, I'm now working on this problem full time for the next couple of days.

The 'fixed' 4LMN protein structure in the example you've linked to is, by eye, completely wrong, and won't be fixed with energy minimization. As I'm sure you're aware, protonation state results for any of the residues within this constructed loop are totally untrustworthy. I think an overhaul of the protein preparation pipeline is required, and I'll be trying a couple methods.

I can narrow down on the method I use if you could tell me a what you do with the 'cleaned' protein structures. For the protein-ligand MCCE simulations, do you simply overlay the ligand over the cleaned protein structure, or do you perform a type of minimization of the ligand position? In the 4AN2 structure, there is a ligand, ATP, and a manganese ion; are all of these included in the MCCE calculations? Thanks

SalahBioPhysics commented 7 years ago

Thanks @gregoryross

Do you simply overlay the ligand over the cleaned protein structure, or do you perform a type of minimization of the ligand position? No we don't do any type of minimization of the ligand position, we simply run MCCE calculation based on the structure you give us.

Are all of these included in the MCCE calculations? We remove all of the non-inhibitor ligands (manganese ion, SO4, ...) from the structure.

gregoryross commented 7 years ago

Okay, thanks. So just to clarify, do you take the 'fixed' protein PDB structures from mcce-charges/pdbs/ and the ligand PDB structures from mcce-charges/epik_inhibitors/output/?

SalahBioPhysics commented 7 years ago

Yes. Thanks a lot.

gregoryross commented 7 years ago

@SalahBioPhysics, I'm having a close look at the Cobimetinib structures, and the cofactors (ATP and ions) are an important part of the protein-inhibitor complex, such that, even with fixed loops, we shouldn't run MCCE on these structures without considering ATP and the ion. Can these be included in the MCCE calculations? If not, do you have another inhibitor-protein structure (that's cofactor free) you've been having trouble with that I can work on?

SalahBioPhysics commented 7 years ago

Since we have about a week to submit the grant, I don't think we'll have time to add more ligands to the calculation. However, we can do it for the paper.

For now, we can only look at protein-inhibitor complex that doesn't have extra ligands. Crizotinib/2YFX_fixed_ph7.4.pdb would do. I will give you a list of PDB soon.

SalahBioPhysics commented 7 years ago

Here is a list of the inhibitors that have/don't have extra ligands. Afatinib Alectinib: has EDO Axitinib Bosutinib Ceritinib: has GOL Cobimetinib: has ACP and MG Crizotinib Dabrafenib Dasatinib Gefitinib Idelalisib Imatinib: has 3YY Lapatinib: has PO4 Lenvatinib: has DTT, EDO, GOL, SO4 Nilotinib Osimertinib Palbociclib: has DMS Pazopanib: has SO4 Ponatinib: has EDO Regorafenib Sorafenib: has ACT and DTT Sunitinib Tofacitinib

Can we get a relaxed structure of the PDBs with the inhibitor bound (_fixed_ph7.4.pdb) and un-bound (_fixed_ph7.4_apo.pdb)?

And this is a dictionary that relate the PDB id to the ligand (if needed). PDB_Inhibitor_Dictionary = { '4G5J':'Afatinib', '4G5P':'Afatinib', '3AOX':'Alectinib', '4AG8':'Axitinib', '4AGC':'Axitinib', '3UE4':'Bosutinib', '4MKC':'Ceritinib', '4AN2':'Cobimetinib', '4LMN':'Cobimetinib', '2WGJ':'Crizotinib', '2XP2':'Crizotinib', '2YFX':'Crizotinib', '4ANQ':'Crizotinib', '4ANS':'Crizotinib', '5AAA':'Crizotinib', '5AAB':'Crizotinib', '5AAC':'Crizotinib', '4XV2':'Dabrafenib', '5CSW':'Dabrafenib', '5HIE':'Dabrafenib', '2GQG':'Dasatinib', '4XEY':'Dasatinib', '2ITO':'Gefitinib', '2ITY':'Gefitinib', '2ITZ':'Gefitinib', '3UG2':'Gefitinib', '4I22':'Gefitinib', '4WKQ':'Gefitinib', '4XE0':'Idelalisib', '2HYY':'Imatinib', '3PYY':'Imatinib', '1XKK':'Lapatinib', '3WZD':'Lenvatinib', '3CS9':'Nilotinib', '4ZAU':'Osimertinib', '2EUF':'Palbociclib', '5L2I':'Palbociclib', '3CJG':'Pazopanib', '3IK3':'Ponatinib', '3OXZ':'Ponatinib', '3ZOS':'Ponatinib', '4C8B':'Ponatinib', '4QRC':'Ponatinib', '4TYJ':'Ponatinib', '4U0I':'Ponatinib', '4UXQ':'Ponatinib', '4V01':'Ponatinib', '4V04':'Ponatinib', '2QU5':'Regorafenib', '3WZE':'Sorafenib', '4ASD':'Sorafenib', '4AGD':'Sunitinib', '3LXK':'Tofacitinib' }

gregoryross commented 7 years ago

Great, thanks a lot. That's very helpful. I'll get the relaxed structures to you as soon as I can.

I'm still a little confused as to where you get the inhibitor structures from for the MCCE calculations. Although the inhibitor structures in mcce-charges/epik_inhibitors/output/ are fully protonated (which is what I need) they do not overlap with the binding modes of the ligands in mcce-charges/pdbs/. Do you rotate and translate the structures in mcce-charges/epik_inhibitors/output/ to put them in the binding sites before each calculation, or is it something that I'm missing? Thanks!

gregoryross commented 7 years ago

Just a quick update: it looks like the inhibitor structures are taken from one of the PDB files, but that location doesn't align with the others.

SalahBioPhysics commented 7 years ago

I will look into the first question regarding the inhibitors PDB.

I'm not sure what you mean by "location doesn't align"

bas-rustenburg commented 7 years ago

@gregoryross, @SalahBioPhysics

If I understand correctly, the ligand coordinates in the MCCE calculations are taken from mcce-charges/epik_inhibitors/output/, not from the protein PDB file that contained the ligand.

I took a quick look at the script that generates those ligand files. It takes them from the Chem_ID column in this file

These identifiers are PDB identifiers for the ligand-expo. Presumably, these are taken arbitrarily from some experimental data source, but those coordinates are not going to be correct for putting a ligand back in an arbitrary APO pdb file. They should be incompatible.

gregoryross commented 7 years ago

Thanks @bas-rustenburg, that's what looks like to me. Does that seem reasonable @SalahBioPhysics?

If that is indeed the case, I'm happy to completely overhaul the preparation pipeline to ensure that 1) the loops are reasonable, and 2) each inhibitor is in the correct location and has the right binding mode for the corresponding protein structure. Unfortunately, that will take more time than the original plan.

SalahBioPhysics commented 7 years ago

Yes @gregoryross. That would be great, thanks a lot.

gregoryross commented 7 years ago

No problem @SalahBioPhysics.

gregoryross commented 7 years ago

Hi @SalahBioPhysics, could you tell me exactly what files you need to run a protein-ligand MCCE simulation? For instance, if you were to run an ABL-imatinib simulation using the 2HYY PDB structure, which files from the mcce-charges repo would you use (assuming the structures are correct)? Thanks!

SalahBioPhysics commented 7 years ago

Sorry for the late respond.

I believe you don't need to re-run Epik for 4, 5, and 6.

gregoryross commented 7 years ago

Great, thanks.

And just to double check: the Imatinib-input.pdb, Imatinib-epik.mol2, and Imatinib-epik-charged.mol2 structures should be in the correct binding mode with respect to 2HYY_fixed_ph7.4.pdb, right?

Also, are you aware that the XXXX_fixed_ph7.4.pdb files contain water molecules? I'm guessing all water molecules should be removed? The XXXX_fixed_ph7.4.pdb files also contain a the depronated inhibitor inside the binding site as well, is that required? I just want to make sure the files we supply contain exactly what you need for the MCCE calculations.

SalahBioPhysics commented 7 years ago

Imatinib-input.pdb, Imatinib-epik.mol2, and Imatinib-epik-charged.mol2 structures should be in the correct binding mode with respect to 2HYY_fixed_ph7.4.pdb, right? I honestly don't know.

are you aware that the XXXX_fixed_ph7.4.pdb files contain water molecules? Yes, we will remove them

The XXXX_fixed_ph7.4.pdb files also contain a the depronated inhibitor inside the binding site as well, is that required? Yes, however; MCCE looks into the topology file that we built using the Epik output, so it will consider all protomers/tautomers states.

SalahBioPhysics commented 7 years ago

@gregoryross if you have any structures ready, please feel free to push them. I have data based on fast quick calculation. Would be nice to test against the new structures. However, no pressure :)

jchodera commented 7 years ago

@steven-albanese : Didn't you just push a batch of structures?

jchodera commented 7 years ago

Are those over here? Are they ready to go?

mrgunner commented 7 years ago

Just to clarify. We do not need additional charge files for the inhibitors. The problem is local. The inhibitors have van der Waals clashes with specific amino acids. If you want Salah can tell specifically which ones are clashing. This REALY destabilizes binding. Right now we have turned off the vdw to measure the binding energy. With the runs we are doing with limited conformer sampling this is fine. So we could love to have energy minimized files to see this that helps. (even 1 energy minimized protein would let us know if our force fields are similar enough that this will help. m

On Dec 1, 2016, at 8:00 PM, Salah Salah notifications@github.com wrote:

@gregoryross https://github.com/gregoryross if you have any structures ready, please feel free to push them. I have data based on fast quick calculation. Would be nice to test against the new structures. However, no pressure :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/choderalab/mcce-charges/issues/48#issuecomment-264343558, or mute the thread https://github.com/notifications/unsubscribe-auth/AJtPXg-tJBGFh_OJvU7hAtHgd8Re0XKNks5rD226gaJpZM4K645n.

jchodera commented 7 years ago

We're on it! @steven-albanese has engineered a new pipeline to build looks and relax the structures with the ligand present using Schrodinger tools. I think that's nearly ready.

gregoryross commented 7 years ago

Hi @SalahBioPhysics and @mrgunner, @steven-albanese has produced new structures using Schrodinger's pipeline. They can be found in this repo:

PDBFinder

These structures (found in the pdbs folder and called 'XXXX-fixed.pdb`) are far cleaner than what we previously had. However, Steven and I have had a close look at a many of these and, in general, still need more work to clean them up, which is what I'll be doing today. Some of the problems are

Many proteins are still missing residues because the missing loops were too large to model in. This structures should be discarded.
All of the modeled loops require further minimization due to some unusual geometries (this is especially important for the loops that are adjacent to the inhibitor).
The structures contain waters and other small organic molecules that need to be removed, (easy to fix).
There are handful of structures with cofactors bound. These structures should be discarded. The ones that we've spotted so far are
- 2EUF has a cycline bound
- 3PYY has a small molecule activitor
- Both cobimetinib structures have ATP and a metal ion

I'm currently working on generating a list of the structures we can use, and refining the ones we can use. If you really want to work on a structure now, please do take one from the repository, but please check it by eye, and have a look at the log files that can be found in the XXXX-fixed/ directories.

gregoryross commented 7 years ago

@mrgunner, as you can see from the above discussion, we're unsure what structure you need for the inhibitors. Do your MCCE calculations require a PDB structure of the inhibitor to be fully protonated (or have multiple charge states) and be located in the protein binding site? I ask because it seems as though---and I certainly may be wrong here---that the inhibitor structures you used in the previous calculations are in random locations.

gregoryross commented 7 years ago

@SalahBioPhysics and @mrgunner the structures from the latest rounds of refinement are located here. As documented in this notebook, the structures have been selected with various criteria, and minimized with openmm. Many structures failed the minimization process for a variety of reasons, and 31 have passed. Some of the loops may have had such bad clashes that the simulations still blew up during the minimization. There may still be problems with these, and they all should be manually inspected. However, given the time-pressure, the minimized structures may be considered clean enough for now.

The structures that passed the current tests are for the following complexes:

Alectinib-ALK
Axitinib-VEGFR1
Bosutinib-BCR-ABL
Ceritinib-ALK
Crizotinib-ALK
Crizotinib-MET
Dasatinib-BCR-ABL
Dasatinib-BCR-ABL
Erlotinib-EGFR
Gefitinib-EGFR
Imatinib-BCR-ABL
Nilotinib-BCR-ABL
Palbociclib-CDK6
Ponatinib-DDR1
Regorafenib-VEGFR1
Sorafenib-VEGFR1
Sunitinib-VEGFR1
Tofacitinib-JAK3

The minimized structures can be found in the `/minimized/' directories. They still contain explicit water molecules.

SalahBioPhysics commented 7 years ago

Great, thanks @gregoryross @steven-albanese and @bas-rustenburg. I'll run mcce and update you.

gregoryross commented 7 years ago

@SalahBioPhysics, it was @steven-albanese and I!

SalahBioPhysics commented 7 years ago

yes, I meant to tag you. Thanks to both :)

gregoryross commented 7 years ago

Hi @SalahBioPhysics, I've finished my final round of refinement of the kinase structures. Many of the structures that failed the initial round of refinement with openmm required manually fixing and have run successfully this time. I've also minimized all structures in explicit solvent, as opposed to vacuum in the original set. The latest structures are located in https://github.com/choderalab/PDBfinder inside the pdbs/*/explicit_water_minimization directories.

The structures 4G5P, 4WKQ, 4XE0, and 4ZAU had missing loops longer than 20 residues and couldn't be modeled in. As before, the cobimetinib-MEK structures (4AN2 and 4LMN) were omitted due to the presence of ATP and a metal ion in the binding site. Finally, 3CJG was also not minimized due the the presence of non-standard residues (possibly post-transnational modifications) that were difficult to model with openmm. More details can be found in the refinement-tools directory of https://github.com/choderalab/PDBfinder.

All the best,

Greg

SalahBioPhysics commented 7 years ago

@gregoryross awesome, thanks a lot. It will be a while before I start working on this again, but i'll keep you posted :)

gregoryross commented 7 years ago

Great, thanks.

I'll now close this issue.

jchodera commented 7 years ago

Quick question: Does this batch include structures of apo kinases (without ligands) as well? Or just the kinase:inhibitor complexes?

gregoryross commented 7 years ago

The structures in pdbs/*/explicit_water_minimization/ are kinase-inhibitors complexes in explicit water, with all non-protein atoms listed as HETATMS. The data set contains no true apo structures, so pseudo apo structures have to be made by removing the ligands.

jchodera commented 7 years ago

OK! One other aspect to look at in the future is whether the pseudo-apo structures behave in the same way as true apo structures (if they exist). I'll create a separate issue!

gregoryross commented 7 years ago

Sounds good!

choderalab / mcce-charges

Proteins to relax #48