Closed lanselibai closed 6 years ago
You have to import it as
import gromacs
Reading a xpm file:
import gromacs.formats
xpm = gromacs.formats.XPM("./ss.xpm")
See the docs for XPM for more details how to work with the class.
To get the "33th residue at 22th time frame " do
resid = 33 # 1-based
frame = 22 # 0-based
resnum = xpm.yvalues[resid-1]
time = xpm.xvalues[frame]
state = xpm.array[frame, resid-1]
print("secondary structure for resid={0} at time {1:.3f} ps: {2}".format(resnum, time, state))
Note that the xpm.array
contains the entries
{u'3-Helix', u'5-Helix', u'A-Helix', u'B-Bridge', u'B-Sheet', u'Bend', u'Chain_Separator', u'Coil', u'Turn'}
Thank you, I think GW works for python 2, not python 3. Can I ask if there is an alternative for python 3 as well? Or, can I use python 2 and 3 in the same time?
Or, what if I re-write your code into python3? How many files are involved in the xpm processing?
I am also interested in batch-processing rmsd/rmsf/gyration files, but I prefer python 3.
The following errors are for python 3 only:
Sorry, both import gromacs
and import gromacs.formats
could not work on both PyCharm
and IDLE
with SyntaxError: invalid syntax
File "C:\Python36\lib\site-packages\gromacs\__init__.py", line 192, in <module>
from . import fileformats
File "C:\Python36\lib\site-packages\gromacs\fileformats\__init__.py", line 10, in <module>
from .xvg import XVG
File "C:\Python36\lib\site-packages\gromacs\fileformats\xvg.py", line 203, in <module>
import gromacs.utilities as utilities
File "C:\Python36\lib\site-packages\gromacs\utilities.py", line 275
except OSError, err:
^
SyntaxError: invalid syntax
Can I ask about the Chain_Separator
? My protein has two chains, light chain (LC) from 1-214, heavy chain (HC) from 215-442. xpm.yvalues
outputs 1 to 443, so there is one Chain_Separator
. So is the 443th the Chain_Separator
?
I just want to double confirm that. I also asked question on the Gromacs forum.
https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2018-February/thread.html
[gmx-users] Why "do_dssp" gives one more residue? ZHANG Cheng
We have not ported GW to Python 3 yet, see issue #44 . I'd like to but haven't had enough time. In principle it should not be hard because GW is pure Python. An initial attempt was PR #101, see comments there. If you work on making it work under Python 3 (while still supporting Python 2) I'd be happy to consider your contribution for inclusion – just submit a pull request (PR) on GitHub.
thank you for the effort, I will write my own python3.
Regarding Chain_separator
: as far as I can tell you're right: it inserts an additional "residue". I have an example ss.xpm
with two chains and each contains 374 residues:
>>> import numpy as np
>>> import gromacs.formats
>>> xpm = gromacs.formats.XPM("./ss.xpm")
>>> print(xpm.array.shape)
(10001, 749)
>>> separators = np.all(xpm.array == "Chain_Separator", axis=0)
>>> sep_resnum = xpm.yvalues[separators]
>>> print(sep_resnum)
[375]
>>> # I only have one separator so I am shortcutting and assigning it to
>>> sep_indx = np.where(xpm.yvalues == sep_resnum[0])[0][0]
>>> # data for chain A
>>> ssA = xpm.array[:, :sep_indx]
>>> # data for chain B
>>> ssB = xpm.array[:, sep_indx+1:]
>>> print(ssA.shape, ssB.shape)
((10001, 374), (10001, 374))
If you fix the things that allow you to import on Python 3 and submit this as a PR then that could be a start.
I am also interested in batch-processing rmsd/rmsf/gyration files, but I prefer python 3.
You mean reading the output from the Gromacs commands in the form of XVG files? GW has the gromacs.formats.XVG
class. But you can also use the np.loadtxt()
function if you write XVG files without legends.
You can also run analysis of this kind in MDAnalysis, see
for the MDAnalysis, I have some problems in installing it, and could not solve it at the moment. So writing my own one might be easier.
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools
For MDAnalysis on Windows see https://github.com/MDAnalysis/mdanalysis/wiki/MDAnalysis-on-Windows
thank you. I do have ubuntu on VMwarePlayer on Win7. But I do not want to frequently use the ubuntu, as I need to copy and paste the files into it. I am writing my own python3 script to process the xvg/xpm files from gromacs.
Ok – I'll close the issue. Feel free to ask it to be re-opened (or re-open it yourself).
Hi @orbeckst , I am not sure if I did it correctly.
I used
import gromacs.formats
xpm = gromacs.formats.XPM("./ss.xpm")
print xpm.array[0, 1]
So this should output the first time and the second residue. But what I got is Coil
while it is G
in the ss.xpm
file, which should be 3-Helix
. The file is here
I think you're reading the XPM file wrong. The "G" that you are reading at https://github.com/lanselibai/attach/blob/850cc4255e29ead9ef98f14aa655e99992b67057/ss.xpm#L46 belongs to the second but last residue.
>>> xpm.array[0, -2]
u'3-Helix'
The XPM is a bit map with coordinate origin in the lower left corner.
^
res N
|
|
|
res 1
0------------- time --->
If you want the original orientation, use reverse=False
xpm = gromacs.formats.XPM("ss.xpm", reverse=False)
If you have evidence that suggests that this is wrong for the DSSP files then let me know.
EDIT: added output for second but last residue at time step 0.
Thank you I got it now!
But if the last residue shows first, then in my case, the chain separator might be between the two chains, not before the last residue. So is there a solid evidence for the description about the location of chain separator?
@orbeckst
As far as I can tell, your ss.xpm does not contain an explicit chain separator. It is not listed in the legend section and when I load it, I also don't see it:
>>> import numpy as np
>>> from gromacs.formats import XPM
>>> ss = XPM("./ss.xpm")
>>> ss.array.shape
(1064, 443)
>>> np.unique(ss.array)
array([u'3-Helix', u'5-Helix', u'A-Helix', u'B-Bridge', u'B-Sheet',
u'Bend', u'Coil', u'Turn'], dtype='<U8')
As far as I can tell, your ss.xpm
looks as if it contains DSSP data for residues 1 to 443. In https://github.com/Becksteinlab/GromacsWrapper/issues/125#issuecomment-375964569 you said
My protein has two chains, light chain (LC) from 1-214, heavy chain (HC) from 215-442.
If its is true what Justin says in his reply to you
Thank you. But it does not make sense to me. Do you know if separator line is always all the ~ ? If the separator line is following the 214th residue, the 215th residue should be the separator line, but why the 215th residue contains the secondary structure?
Maybe that's the case then. It's hard to know because the terminal residue in a protein is often not assigned any secondary structure. In older GROMACS versions, I recall the chain separator clearly showing up as a coil between the chains. Maybe that's not the case any more.
then you could get the "separator" by finding the "residue" with number 215:
>>> ss.array[:, ss.yvalues == 215]
array([[u'Coil'],
[u'Coil'],
[u'Coil'],
...,
[u'Coil'],
[u'Coil'],
[u'Coil']], dtype='<U8')
You could run dssp
on a PDB file of your system and manually compare the output to your xpm to find out where exactly Gromacs inserts the pseudo chain separator "residue". I assume your layout might be
chain | resid | yvalue |
---|---|---|
LC | 1 – 214 | 1 – 214 |
separator | 215 | |
HC | 215 – 442 | 216 – 443 |
All I can say is that in the xpm that I have for my system, the "Chain_separator" is included. My xpm file does not say which version of Gromacs produced it; I think it was 4.6.5 but I am not really sure.
Thank you @orbeckst , it is a good idea to use only a PDB file. The outputs are here. Still, there are 443 residues, and I do not know which one is the chain separator. You can see from the PDB file that there are only 442 residues.
Could you please send me your ss.xpm file?
Where can I find the source code for the do_dssp for different versions of Gromacs?
@orbeckst I got the answer from Carsten now:
OK, I found your issue. You are passing your own ss.map over to gmx do_dssp, which misses the line for the chain separator. If you delete your own ss.map from the directory where you call do_dssp, then do_dssp will by default use the one in share/gromacs/top/ss.map, which contains the separator in the last line. Just checked that with your pdb input and GROMACS 5.1.
in my case, the chain separator is between the HC and LC. And yes, it is in the order:
442 441 ... 216 215 chain separator (i.e. =================) 214 213 ... 2 1
Glad that this was all explained. Closing.
Sorry I am new to the GW. I have successfully installed it on PyCharm. What next should I do?
I tried
import GromacsWrapper
but gotModuleNotFoundError: No module named 'GromacsWrapper'
I also tried
gromacs.g_hbond(s=TPR, f=XTC, g="hbond.log", hbm="hb.xpm", hbn="hb.ndx")
but gotNameError: name 'gromacs' is not defined
I want to import my ss.xpm file, then I can get an numpy array which contains all the ss records of all the residues for all the time frames. For example,
ss[32][21] = 'E'
means the 33th residue at 22th time frame has beta-sheet structure. Is this doable?@orbeckst