How to use GW to process ss.xpm file of secondary structures (ss)?

lanselibai commented 6 years ago

Sorry I am new to the GW. I have successfully installed it on PyCharm. What next should I do?

I tried import GromacsWrapper but got ModuleNotFoundError: No module named 'GromacsWrapper'

I also tried gromacs.g_hbond(s=TPR, f=XTC, g="hbond.log", hbm="hb.xpm", hbn="hb.ndx") but got NameError: name 'gromacs' is not defined

I want to import my ss.xpm file, then I can get an numpy array which contains all the ss records of all the residues for all the time frames. For example, ss[32][21] = 'E' means the 33th residue at 22th time frame has beta-sheet structure. Is this doable?

@orbeckst

orbeckst commented 6 years ago

You have to import it as

import gromacs

Reading a xpm file:

import gromacs.formats
xpm = gromacs.formats.XPM("./ss.xpm")

See the docs for XPM for more details how to work with the class.

orbeckst commented 6 years ago

To get the "33th residue at 22th time frame " do

resid = 33      # 1-based
frame = 22     # 0-based

resnum = xpm.yvalues[resid-1]
time = xpm.xvalues[frame]
state = xpm.array[frame, resid-1]

print("secondary structure for resid={0} at time {1:.3f} ps: {2}".format(resnum, time, state))

Note that the xpm.array contains the entries

{u'3-Helix',  u'5-Helix', u'A-Helix',  u'B-Bridge', u'B-Sheet', u'Bend', u'Chain_Separator', u'Coil', u'Turn'}

lanselibai commented 6 years ago

Thank you, I think GW works for python 2, not python 3. Can I ask if there is an alternative for python 3 as well? Or, can I use python 2 and 3 in the same time?

Or, what if I re-write your code into python3? How many files are involved in the xpm processing?

I am also interested in batch-processing rmsd/rmsf/gyration files, but I prefer python 3.

The following errors are for python 3 only: Sorry, both import gromacs and import gromacs.formats could not work on both PyCharm and IDLE with SyntaxError: invalid syntax

  File "C:\Python36\lib\site-packages\gromacs\__init__.py", line 192, in <module>
    from . import fileformats
  File "C:\Python36\lib\site-packages\gromacs\fileformats\__init__.py", line 10, in <module>
    from .xvg import XVG
  File "C:\Python36\lib\site-packages\gromacs\fileformats\xvg.py", line 203, in <module>
    import gromacs.utilities as utilities
  File "C:\Python36\lib\site-packages\gromacs\utilities.py", line 275
    except OSError, err:
                  ^
SyntaxError: invalid syntax

lanselibai commented 6 years ago

Can I ask about the Chain_Separator? My protein has two chains, light chain (LC) from 1-214, heavy chain (HC) from 215-442. xpm.yvalues outputs 1 to 443, so there is one Chain_Separator. So is the 443th the Chain_Separator?

I just want to double confirm that. I also asked question on the Gromacs forum. https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2018-February/thread.html [gmx-users] Why "do_dssp" gives one more residue? ZHANG Cheng

orbeckst commented 6 years ago

We have not ported GW to Python 3 yet, see issue #44 . I'd like to but haven't had enough time. In principle it should not be hard because GW is pure Python. An initial attempt was PR #101, see comments there. If you work on making it work under Python 3 (while still supporting Python 2) I'd be happy to consider your contribution for inclusion – just submit a pull request (PR) on GitHub.

lanselibai commented 6 years ago

thank you for the effort, I will write my own python3.

orbeckst commented 6 years ago

Regarding Chain_separator: as far as I can tell you're right: it inserts an additional "residue". I have an example ss.xpm with two chains and each contains 374 residues:

>>> import numpy as np
>>> import gromacs.formats
>>> xpm = gromacs.formats.XPM("./ss.xpm")
>>> print(xpm.array.shape)
(10001, 749)
>>> separators = np.all(xpm.array == "Chain_Separator", axis=0)
>>> sep_resnum = xpm.yvalues[separators]
>>> print(sep_resnum)
[375]
>>> # I only have one separator so I am shortcutting and assigning it to 
>>> sep_indx = np.where(xpm.yvalues == sep_resnum[0])[0][0]

>>> # data for chain A
>>> ssA = xpm.array[:, :sep_indx]
>>> # data for chain B
>>> ssB = xpm.array[:, sep_indx+1:]
>>> print(ssA.shape, ssB.shape)
((10001, 374), (10001, 374))

orbeckst commented 6 years ago

If you fix the things that allow you to import on Python 3 and submit this as a PR then that could be a start.

orbeckst commented 6 years ago

I am also interested in batch-processing rmsd/rmsf/gyration files, but I prefer python 3.

You mean reading the output from the Gromacs commands in the form of XVG files? GW has the gromacs.formats.XVG class. But you can also use the np.loadtxt() function if you write XVG files without legends.

You can also run analysis of this kind in MDAnalysis, see

lanselibai commented 6 years ago

for the MDAnalysis, I have some problems in installing it, and could not solve it at the moment. So writing my own one might be easier.

error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools

orbeckst commented 6 years ago

For MDAnalysis on Windows see https://github.com/MDAnalysis/mdanalysis/wiki/MDAnalysis-on-Windows

lanselibai commented 6 years ago

thank you. I do have ubuntu on VMwarePlayer on Win7. But I do not want to frequently use the ubuntu, as I need to copy and paste the files into it. I am writing my own python3 script to process the xvg/xpm files from gromacs.

orbeckst commented 6 years ago

Ok – I'll close the issue. Feel free to ask it to be re-opened (or re-open it yourself).

lanselibai commented 6 years ago

Hi @orbeckst , I am not sure if I did it correctly.

I used

import gromacs.formats
xpm = gromacs.formats.XPM("./ss.xpm")
print xpm.array[0, 1]

So this should output the first time and the second residue. But what I got is Coil while it is G in the ss.xpm file, which should be 3-Helix. The file is here

orbeckst commented 6 years ago

I think you're reading the XPM file wrong. The "G" that you are reading at https://github.com/lanselibai/attach/blob/850cc4255e29ead9ef98f14aa655e99992b67057/ss.xpm#L46 belongs to the second but last residue.

>>> xpm.array[0, -2]
u'3-Helix'

The XPM is a bit map with coordinate origin in the lower left corner.


   ^
res N
   |
   |
   |
res 1
   0------------- time --->

If you want the original orientation, use reverse=False

xpm = gromacs.formats.XPM("ss.xpm", reverse=False)

If you have evidence that suggests that this is wrong for the DSSP files then let me know.

EDIT: added output for second but last residue at time step 0.

lanselibai commented 6 years ago

Thank you I got it now!

But if the last residue shows first, then in my case, the chain separator might be between the two chains, not before the last residue. So is there a solid evidence for the description about the location of chain separator?

@orbeckst

orbeckst commented 6 years ago

As far as I can tell, your ss.xpm does not contain an explicit chain separator. It is not listed in the legend section and when I load it, I also don't see it:

>>> import numpy as np
>>> from gromacs.formats import XPM
>>> ss = XPM("./ss.xpm")
>>> ss.array.shape
(1064, 443)
>>> np.unique(ss.array)
array([u'3-Helix', u'5-Helix', u'A-Helix', u'B-Bridge', u'B-Sheet',
       u'Bend', u'Coil', u'Turn'], dtype='<U8')

As far as I can tell, your ss.xpm looks as if it contains DSSP data for residues 1 to 443. In https://github.com/Becksteinlab/GromacsWrapper/issues/125#issuecomment-375964569 you said

My protein has two chains, light chain (LC) from 1-214, heavy chain (HC) from 215-442.

If its is true what Justin says in his reply to you

Thank you. But it does not make sense to me. Do you know if separator line is always all the ~ ? If the separator line is following the 214th residue, the 215th residue should be the separator line, but why the 215th residue contains the secondary structure?

Maybe that's the case then. It's hard to know because the terminal residue in a protein is often not assigned any secondary structure. In older GROMACS versions, I recall the chain separator clearly showing up as a coil between the chains. Maybe that's not the case any more.

then you could get the "separator" by finding the "residue" with number 215:

>>> ss.array[:, ss.yvalues == 215]
array([[u'Coil'],
       [u'Coil'],
       [u'Coil'],
       ...,
       [u'Coil'],
       [u'Coil'],
       [u'Coil']], dtype='<U8')

You could run dssp on a PDB file of your system and manually compare the output to your xpm to find out where exactly Gromacs inserts the pseudo chain separator "residue". I assume your layout might be

chain	resid	yvalue
LC	1 – 214	1 – 214
separator		215
HC	215 – 442	216 – 443

All I can say is that in the xpm that I have for my system, the "Chain_separator" is included. My xpm file does not say which version of Gromacs produced it; I think it was 4.6.5 but I am not really sure.

lanselibai commented 6 years ago

Thank you @orbeckst , it is a good idea to use only a PDB file. The outputs are here. Still, there are 443 residues, and I do not know which one is the chain separator. You can see from the PDB file that there are only 442 residues.

Could you please send me your ss.xpm file?

Where can I find the source code for the do_dssp for different versions of Gromacs?

lanselibai commented 6 years ago

@orbeckst I got the answer from Carsten now:

OK, I found your issue. You are passing your own ss.map over to gmx do_dssp, which misses the line for the chain separator. If you delete your own ss.map from the directory where you call do_dssp, then do_dssp will by default use the one in share/gromacs/top/ss.map, which contains the separator in the last line. Just checked that with your pdb input and GROMACS 5.1.

lanselibai commented 6 years ago

in my case, the chain separator is between the HC and LC. And yes, it is in the order:

442 441 ... 216 215 chain separator (i.e. =================) 214 213 ... 2 1

orbeckst commented 6 years ago

Glad that this was all explained. Closing.

Becksteinlab / GromacsWrapper

How to use GW to process ss.xpm file of secondary structures (ss)? #125