Not ableto proceed to the next epoch

anuginu commented 5 years ago

I am trying to run adaptive sampling on protein-drug complex using HTMD and gromacs. It is completing the first epoch but not proceeding to the next epoch. It is giving the error "ValueError: invalid literal for int() with base 10: '37A'" Please find the attached folder used as input (one of the starting positions in the folder generators) to start the sampling and the error obtained after the first epoch. Kindly help. gen1.tar.gz /home/anu/Desktop/Screenshot from 2019-01-17 10-06-01.png

stefdoerr commented 5 years ago

There seems to be an error in your PDB file. Try reading it directly with mol = Molecule('mypdb.pdb'). You probably have some misaligned columns. Check the PDB documentation on the specific columns

anuginu commented 5 years ago

Dear Stefan, I read the pdb file with mol = Molecule('2bm2_processed.pdb'). But there seems to be no error. While processing for the next epoch, if the residue number in the pdb is like 184,184A, 184B etc.. should the residue numbers be renumbered to overcome this error (ValueError: invalid literal for int() with base 10: '37A')?

From: "Stefan Doerr" notifications@github.com To: "Acellera/htmd" htmd@noreply.github.com Cc: "g" g_anu@blr.amrita.edu, "Author" author@noreply.github.com Sent: Friday, January 18, 2019 2:29:47 PM Subject: Re: [Acellera/htmd] Not ableto proceed to the next epoch (#841)

There seems to be an error in your PDB file. Try reading it directly with mol = Molecule('mypdb.pdb') . You probably have some misaligned columns. Check the PDB documentation on the specific columns

— You are receiving this because you authored the thread. Reply to this email directly, [ https://github.com/Acellera/htmd/issues/841#issuecomment-455473624 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AcFcJg2aOIj92-tylOjejR8YvCKpezcwks5vEY0DgaJpZM4aHLK5 | mute the thread ] .

stefdoerr commented 5 years ago

No, it should work with the insertion codes fine. I checked the pdb files but they are not what you use for the simulation and they don't contain any 37A residue either. Could you post here the full error log and maybe also a data directory from a completed simulation. I don't have much experience with gromacs but the input files don't help much in this case.

I have a doubt though that HTMD and analyse your simulations since it depends on a single PDB file with the topology on which it will load the simulation trajectory. But an output folder might help me understand the problem better.

anuginu commented 5 years ago

Sorry for the late reply. I was caught up with some work and hence couldn't reply. The filtered output was not created as the input folder contained pdb file with residues having ids like '37A'. (The orginal pdb file was download from RCSB but was given some modificationto have different starting structure using Avogadro software). when I renumbered the resid, it proceeded to the next epoch.

I have a doubt about the starting structures for adaptive sampling..why in protein-ligand simulations the ligand is always placed at 20 Angstroms away? When the ligand -binding site is known, what is the advantage in using adaptive sampling ? Can adaptive sampling be used to give better free energy binding values when a ligand-docked structure is used ? Please clarify.

stefdoerr commented 5 years ago

The 20A is just to not add much bias to the starting conformation as at that distance the interactions with the protein will be very weak. Usually we distribute it around the protein as well to have multiple generator (starting) structures for adaptive. Depending on which adaptive sampling method you use it might speed up binding or unbinding or whatever process you want to sample faster. If you have a docked pose you can use that as well as your starting point. But take care because if it's a strong binder it might take very long to unbind. Depends which process you want to sample really. I don't know if this answers your questions.

anuginu commented 5 years ago

Thank you for the reply. I have some more basic doubts about adaptive sampling. I would really appreciate if you could help.

My primary intention is to use adaptive sampling to speed up simulations involving known binding sites. It is possible that in some cases I have ligands very close to the binding site. Just like how you start-off simulations by placing the ligand in many positions across the surface of the protein, what do you think are good start points for the case of known binding sites? One area, as done in earlier cases may be different start conformations. Would you suggest any more? And, in your opinion, what may be a good published paper I can use as a use case (if exists).

From: "Stefan Doerr" notifications@github.com To: "Acellera/htmd" htmd@noreply.github.com Cc: "g" g_anu@blr.amrita.edu, "Author" author@noreply.github.com Sent: Wednesday, January 30, 2019 7:12:10 PM Subject: Re: [Acellera/htmd] Not ableto proceed to the next epoch (#841)

The 20A is just to not add much bias to the starting conformation. Usually we distribute it around the protein as well to have multiple generator (starting) structures for adaptive. Depending on which adaptive sampling method you use it might speed up binding or unbinding or whatever process you want to sample faster. If you have a docked pose you can use that as well as your starting point. But take care because if it's a strong binder it might take very long to unbind. Depends which process you want to sample really. I don't know if this answers your questions.

— You are receiving this because you authored the thread. Reply to this email directly, [ https://github.com/Acellera/htmd/issues/841#issuecomment-458947518 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AcFcJowu2n_bZF-dlRAThw_KE5HhtA8qks5vIaEygaJpZM4aHLK5 | mute the thread ] .

stefdoerr commented 5 years ago

If you start from multiple poses you will have the problem what you won't have connections between the states. Thus if you use adaptive sampling it will build an MSM out of the largest connected set of states and discard the rest. This is quite inefficient as it's equivalent to starting from a single pose.

In your case you don't want to build an MSM and instead spawn new simulations with a different strategy until all your states are connected. Then you can switch to standard adaptive sampling. But for that you would have to play around a bit with the adaptive sampling implementation.

I remember there is a paper by Vijay Pande on adaptive sampling from bound unbound and intermediate states where they manually move the ligand out of the pocket and then start adaptive simulations but I can't remember the title, you will have to search for it.

Acellera / htmd

Not ableto proceed to the next epoch #841