Acellera / htmd

HTMD: Programming Environment for Molecular Discovery
https://software.acellera.com/docs/latest/htmd/index.html
Other
254 stars 58 forks source link

Add a means for removing hydrogens before parameterizing (because of Amber/tleap) #428

Closed nhstanley closed 6 years ago

nhstanley commented 6 years ago

One of the many unexpected gotchas when using tleap is that it insists on adding the hydrogens itself, and especially if there are hydrogens on the input structure that do not conform to the Amber naming conventions it will fail. While many files from the PDB do not have hydrogens, some do (NMR structures, or those added by zealous crystallographers). And structures might come from other structural biology tools before coming into HTMD.

So, I propose adding a means to strip all the hydrogens from a structure somewhere. This could be a True/False option for the amber.build(...) method, or perhaps part of the protein prep module, or part of molecule module. I didn't see it implemented anywhere but correct me if I'm wrong.

On the one hand, it might seem best to just add it as something like strip_hyrogens=True to the amber.build() method, but then the user might end up with a protonation state they didn't expect unless there's a nice big warning. If you add it as a method elsewhere, it will be more versatile, but then the user will have to know to run it before feeding the structure into the amber build. Or it will have to be called inside the build method, which might be the best of both ideas.

Just my thoughts on the issue. If I have time I'll implement it, but currently just going to use awk '{if ($3 !~ /^H/ ) print $0}' in.pdb > out.pdb and go to the next step.

stefdoerr commented 6 years ago

You can of course do mol.filter('noh') which will remove all hydrogens from a Molecule.

I would not remove during build because for example CHARMM doesn't modify resnames for protonated residues so once you remove the hydrogens there is no way of going back.

j3mdamas commented 6 years ago

Hi Nate,

the proteinprepare does not solve your problems? Could you attach here an example of what you are facing so we may look at it?

Thanks! João

nhstanley commented 6 years ago

Stefan's suggestion of using the filter method works fine. The lingering question is whether you want to hand-hold the user by adding the functionality I suggested. I'm actually a little ambivalent as well because on the one hand this would be somewhat hidden functionality that might mess up the user's intent, but on the flip side tleap errors are really cryptic and so this could really save them some time.

An alternate idea is just to have an Amber specific system building tutorial where this and other common gotchas are discussed, since this isn't the only one. The purpose of something like HTMD is to automate away as many of these trivial annoyances as possible. On the other hand, if you do too many hidden things, the user doesn't know what's going on.

Anyway, I submitted a pull request that is one attempt to address this. You could also replace the strip_hydrogens=True with an atomselection string e.g. keep_only='noh', so if the user wants everything they can just set keep_only='all'. Just some ideas.

j3mdamas commented 6 years ago

Hi Nate,

Thanks for your contribution, but can you share with us a working case that is failing for you? That would be really helpful! Thanks!

nhstanley commented 6 years ago

I can't share the molecule I'm working on, but you get a similar error when I try to load the first structure in 1KDX from the PDB: http://www.rcsb.org/pdb/explore/explore.do?structureId=1KDX. It's an NMR structure so it has hydrogens. In that specific case, it's because tleap is trying to rename all the HIS --> HIE, but they are delta protonated in the PDB. The basic idea is the same, though, and in my case it happens for most of the hydrogens on a structure I prepared with Maestro.

Hmmm... this test case make me rethink whether this should default to being True. It would be better if the user were just notified of the error and it's likely cause. They could then choose to enable it if they wanted. Perhaps we could also parse the log file and try to give the user some hints? I can try implementing that but I'll definitely need more time to do that.

j3mdamas commented 6 years ago

Thanks for the feedback. Let me see your example and maybe I'll have a better idea of it.

stefdoerr commented 6 years ago

as Joao said, prepareProtein should take care of these cases. I will check tomorrow if it's the case

On Wed, Aug 16, 2017 at 7:50 PM, Nate Stanley notifications@github.com wrote:

I can't share the molecule I'm working on, but you get a similar error when I try to load the first structure in 1KDX from the PDB: http://www.rcsb.org/pdb/explore/explore.do?structureId=1KDX. It's an NMR structure so it has hydrogens. In that specific case, it's because tleap is trying to rename all the HIS --> HIE, but they are delta protonated in the PDB. The basic idea is the same, though, and in my case it happens for most of the hydrogens on a structure I prepared with Maestro.

Hmmm... this test case make me rethink whether this should default to being True. It would be better if the user were just notified of the error and it's likely cause. They could then choose to enable it if they wanted. Perhaps we could also parse the log file and try to give the user some hints? I can try implementing that but I'll definitely need more time to do that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/428#issuecomment-322848745, or mute the thread https://github.com/notifications/unsubscribe-auth/AHkVgpAUA_t_ywnaj3UhPofjKIM005ySks5sYywDgaJpZM4O11OM .

nhstanley commented 6 years ago

Yes, it fixes it for the test case, and for my proprietary molecule. So I agree that my PR was the wrong solution. But I still think a way to hint to the user what might be going wrong with tleap might be worth implementing, to save everyone time from issues like this being opened :^).

j3mdamas commented 6 years ago

Yeah, you are right about that (I mean, the tleap log being translated into something useful for the htmd user). @stefdoerr, if you look at it, I'll assign it to you then.

stefdoerr commented 6 years ago

I have added an asana issue on adding tleap error parsing so that users can understand what's going on

stefdoerr commented 6 years ago

Added log parsing in the latest commit 97e6a1f87e0bdbe91fce05bf380ac70b8ee490fe

If anyone still gets a generic error in building please report it to me so that I can parse it.