prepareProtein error - Githubissues

noeliaferruz commented 8 years ago

Hi all,

I was trying to use proteinPrepare, I think is a nice feature and will be very useful soon. So first I did:

> >>> protein=Molecule('./model.pdb')
> >>> protein.filter("protein and not resid 1 to 31" )
> >>> protCaps = segmentgaps(protein,'protein','P').
> >>> solvated = solvate(protein)
> >>> detectDisulfideBonds(solvated, minmax=np.vstack((m,M)))
> >>> built= charmm.build(solvated, topo=topos, param=params,outdir='./build')

This builds a system as we could expect. But this doesn't:

>>> protein=Molecule('./model.pdb')
>>> protein.filter("protein and not resid 1 to 31" )
>>> mpp,dat=prepareProtein(protein,returnDetails=True)
Warning: no pdbfile provided
propka3.1                                                                                    2016-04-14
-------------------------------------------------------------------------------------------------------
--                                                                                                   --
--                                   PROPKA: A PROTEIN PKA PREDICTOR                                 --
--                                                                                                   --
--                                 VERSION 1.0,  04/25/2004, IOWA CITY                               --
--                                             BY HUI LI                                             --
--                                                                                                   --
--                            VERSION 2.0,  11/05/2007, IOWA CITY/COPENHAGEN                         --
--                                BY DELPHINE C. BAS AND DAVID M. ROGERS                             --
--                                                                                                   --
--                                VERSION 3.0,  01/06/2011, COPENHAGEN                               --
--                            BY MATS H.M. OLSSON AND CHRESTEN R. SONDERGARD                         --
--                                                                                                   --
--                                VERSION 3.1,  07/01/2011, COPENHAGEN                               --
--                            BY CHRESTEN R. SONDERGARD AND MATS H.M. OLSSON                         --
-------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------
 References:

   Very Fast Empirical Prediction and Rationalization of Protein pKa Values
   Hui Li, Andrew D. Robertson and Jan H. Jensen
   PROTEINS: Structure, Function, and Bioinformatics 61:704-721 (2005)

   Very Fast Prediction and Rationalization of pKa Values for Protein-Ligand Complexes
   Delphine C. Bas, David M. Rogers and Jan H. Jensen
   PROTEINS: Structure, Function, and Bioinformatics 73:765-783 (2008)

   PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pKa predictions
   Mats H.M. Olsson, Chresten R. Sondergard, Michal Rostkowski, and Jan H. Jensen
   Journal of Chemical Theory and Computation, 7(2):525-537 (2011)

   Improved Treatment of Ligands and Coupling Effects in Empirical Calculation
    and Rationalization of pKa Values
   Chresten R. Sondergaard, Mats H.M. Olsson, Michal Rostkowski, and Jan H. Jensen
   Journal of Chemical Theory and Computation, (2011)

-------------------------------------------------------------------------------------------------------

Warning: Missing atoms or failed protonation for C-  331 A (COO) -- please check the structure
         Group (COO) for  4910- OXT   331-LEU (A) [ 124.386    7.622   70.718] O
         Expected 2 interaction atoms for acids, found:
              4910- OXT   331-LEU (A) [ 124.386    7.622   70.718] O
         Expected 2 interaction atoms for bases, found:
              4910- OXT   331-LEU (A) [ 124.386    7.622   70.718] O
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/htmd/site-packages/htmd/proteinpreparation/proteinpreparation.py", line 141, in prepareProtein
    propkaOptions=propka_opts)
  File "/path/to/htmd/site-packages/htmd/proteinpreparation/pdb2pqr/main.py", line 307, in runPDB2PQR
    pka_molecule, pka_not_found, pkadic = myRoutines.runPROPKA31(propkaOptions)
  File "/path/to/htmd/site-packages/htmd/proteinpreparation/pdb2pqr/src/routines.py", line 1628, in runPROPKA31
    pka_molecule = propka.molecular_container.Molecular_container(HFreeProteinFile.name, pka_options)
  File "/path/to/htmd/site-packages/PROPKA-3.1-py3.5.egg/propka/molecular_container.py", line 54, in __init__
  File "/path/to/htmd/site-packages/PROPKA-3.1-py3.5.egg/propka/molecular_container.py", line 109, in extract_groups
  File "/path/to/htmd/site-packages/PROPKA-3.1-py3.5.egg/propka/conformation_container.py", line 34, in extract_groups
  File "/path/to/htmd/site-packages/PROPKA-3.1-py3.5.egg/propka/group.py", line 1179, in is_group
  File "/path/to/htmd/site-packages/PROPKA-3.1-py3.5.egg/propka/group.py", line 1229, in is_ligand_group_by_groups
  File "/path/to/htmd/site-packages/PROPKA-3.1-py3.5.egg/propka/protonate.py", line 145, in protonate_atom
  File "/path/to/htmd/site-packages/PROPKA-3.1-py3.5.egg/propka/protonate.py", line 170, in set_number_of_protons_to_add
KeyError: ''

Let me know.

Thanks Noelia

tonigi commented 8 years ago

Hmm... is the PDB file available for me to test? If not, try the bare propka31 on the same file.

stefdoerr commented 8 years ago

By the way Noelia take care with segmentgaps (called autoSegment now). It doesn't modify the molecule in place, it returns a modified molecule, which in your first script you are not using afterwards.

noeliaferruz commented 8 years ago

@tonigi i cannot. I'll find another molecule @stefdoerr yes yes i know. i pseudo invented the commands

noeliaferruz commented 8 years ago

"Unfortunately" other two examples i've tried worked. I'll cat the helix that is crashing and will attach it here. (tomorrow xD)

j3mdamas commented 8 years ago

Hi Noelia, Sent you an email about this.

noeliaferruz commented 8 years ago

Hello!!

This part of the molecule also crashes. Just rename it to pdb. I bet it is something related with the maestro format, because opening it with vmd, and writing out a new structure with the same selection solves the problem.

But perhaps useful to understand what's wrong with this format.

Thanks, Noelia

helices.txt

tonigi commented 8 years ago

it seems a problem with hydrogen names - some software uses names (or pdb format variants) which are not recognized as hydrogens. Removing them should fix the problem (until i find a general solution).

Edit: the problem is here

HETATM 5435 1HA  NMA

giadefa commented 8 years ago

what is the point of keeping them in the first place?

On 15 April 2016 at 22:29, Toni G notifications@github.com wrote:

it seems a problem with hydrogen names - some software uses names which are not recognized as hydrogens. Removing them should fix the problem (until i find a general solution).

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/Acellera/htmd/issues/16#issuecomment-210632055

http://www.acellera.com

   <https://twitter.com/acellera>

https://www.youtube.com/user/acelleracom https://www.linkedin.com/company/2133167?trk=tyah&trkInfo=clickedVertical%3Acompany%2CclickedEntityId%3A2133167%2Cidx%3A2-1-2%2CtarId%3A1448018583204%2Ctas%3Aacellera https://www.acellera.com/md-simulation-blog-news/ http://is.gd/1eXkbS

noeliaferruz commented 8 years ago

I am basically testing the most common issues the users here will likely find.

The point is that the inexperienced user needs a set of scripts to build, equilibrate and run an adaptive. Unfortunately the inexperienced user does not want to look inside these scripts and has no interest either on trying the tutorials.

I have them split the process into four scripts: 1.build.py, 2.equil.py, 3.prod.py, 4.adaptive.py. The 3rd creates the generators and the other scripts have the obvious usage.

It turns that scripts 2,3 and 4 ALWAYS work. Script 1, rarely ever does. By working I mean that don't appear errors in the command line and the output is as expected. Note that the user very often does not know what the output should look like -and has no interest in learning so either-.

So for the part regarding 1.build.py, it would be ideal if the user could provide the protein of interest -usually prepared through maestro or similar softwares that add hydrogens or caps- and the other parts and get the system. Ideally, in the future this could look like:

./1.build.py -p protein.pdb -l ligand1.pdb ligand2.pdb -m membrane.pdb -c cofactor.pdb and so on.

At the current stage 1.build.py is far from looking to work without supervision but ideally in the future would be nice if it does.

So that's why I keep the hydrogens. If proteinPrepare doesn't handle some maestro H, I will add a line right before reading protein.filter('hydrogen') in the main script so the user doesn't have to figure out what's causing the error, while in the meanwhile proteinPrepare could become more robust and accept different kind of inputs.

Thanks, Noelia

tonigi commented 8 years ago

H can be useful because propKa can decide protonation of ligand-protein complexes, but im still thinking on how to best use this.

Note that in the previous error the cause was probably the NMA appearing as HETATM... try filtering that out. In the best case it is ignored.

noeliaferruz commented 8 years ago

Thanks Toni,

Do you mean with it it's ignored that you have made some changes in proteinPrepare so that NMA is ignored now? I will otherwise filter maestro caps by default in these scripts.

Thanks, Noelia

tonigi commented 8 years ago

No recent changes on that. I don't think NMA is ignored: it must be removed manually. Then build with addcaps re-adds it in the ff-specific way.

tonigi commented 8 years ago

Can I close this issue*? I don't think we can/should filter some residue names (like NMA) by default. Hydrogens maybe, but it was not related to this issue.

(*) actually it seems i can comment but not close the issue

noeliaferruz commented 8 years ago

ok! sure sure

Acellera / htmd

prepareProtein error #16