Model residue variants are not consistent for target FAK2_HUMAN_D0

choderalab / ensembler

Automated omics-scale protein modeling and simulation setup.

http://ensembler.readthedocs.io/

GNU General Public License v2.0

53 stars 21 forks source link

Model residue variants are not consistent for target FAK2_HUMAN_D0 #21

Closed danielparton closed 9 years ago

danielparton commented 9 years ago

During the implicit solvent MD stage, models should be given the same protonation states as a reference model (the model with highest template-target sequence identity).

However, the implicit solvent MD models for the TK target FAK2_HUMAN_D0 have different topologies:

FAK2_HUMAN_D0_3CC6_A - 4227 atoms; CYS35-CYS39 no disulfide bond; reference model
FAK1_HUMAN_D0_4Q9S_A - 4225 atoms; CYS35-CYS39 disulfide bond

I'm looking into this now, but it's not immediately clear why this happened. They were generated during the same ensembler run, so really should have been using the same residue variants.

jchodera commented 9 years ago

Doesn't simtk.openmm.app.Modeller automatically add disulfide bonds by distance? Do we use that in the pipeline?

danielparton commented 9 years ago

I thought that was done by the addHydrogens routine (a member function of simtk.openmm.app.Modeller). That accepts a variants= argument, which I was using to keep the residue variants consistent with a reference model. Unless the disulfide bond is determined when first making the Modeller object, and the addHydrogens routine simply adds hydrogens based on whether or not a disulfide bond is already present in the Modeller object? Checking now..

danielparton commented 9 years ago

Ok, so the disulfide bond is indeed defined by distance when initializing the Modeller object. The list of bonds in the topology then determines which protonation states are assigned by the addHydrogens member function. So I'll have to change the code to use the bonds data to keep disulfide bonds consistent across models. I think there should only be a few TK targets affected by this, but I'll need to redo implicit solvent MD for them.

danielparton commented 9 years ago

I'm hoping I can just copy the ._bonds list from the reference topology to all models.

jchodera commented 9 years ago

Ok, so the disulfide bond is indeed defined by distance when initializing the Modeller object.

Can we also report this to the OpenMM issue tracker as a behavior we would like some way to control?

danielparton commented 9 years ago

Will do. I've implemented a workaround in Ensembler for now, which seems to be working.

danielparton commented 9 years ago

Actually, turns out the disulfide bond is first defined when making the app.PDBFile object, which is then used to build the app.Modeller object. So there is a simple and non-hacky way to tackle this by storing the app.PDBFile.topology object for the reference structure, and using that to make the app.Modeller object for each model.

jchodera commented 9 years ago

Awesome!

How do we choose the reference structure, and are we sure auto detecting disulfide bonds is the right thing to do?

I'm not actually sure if intracellular kinase domains would ever have disulfide bonds due to the reducing intracellular environment.

Maybe we want to have two options? Either "automatic" (use reference structure to determine which disulfide bonds to preserve) or "reduce" (no disulfide bonds)?