Amber-MD / pytraj

Python interface of cpptraj
https://amber-md.github.io/pytraj
170 stars 38 forks source link

Using masks with numeric residues? #1593

Open markperri opened 2 years ago

markperri commented 2 years ago

I'm trying to select a residue in my trajectory (loaded from NAMD). I have numeric residue names and I think it's causing problems using masks. Is there a way around this? Maybe renaming the residue from 000 to AAA or iterating through all atoms and checking their residue names?

print(set(residue.name for residue in _traj.top.residues))
{'TIP3', '000'}
residueList = traj.top.select(':000')
residueList
Error: One or both numbers of mask arg (000) < 1 (0, 0)
Error: Could not parse mask [:000].

python version: 3.9 pytraj version: 2.0.5

Thanks, Mark

markperri commented 2 years ago

Is this a reasonable workaround?

for residue in traj.top.residues:
    if (residue.name == "000"):
        residueList = [*range(residue.first_atom_index,residue.last_atom_index)]
hainm commented 2 years ago

Is this a reasonable workaround?

for residue in traj.top.residues:
    if (residue.name == "000"):
        residueList = [*range(residue.first_atom_index,residue.last_atom_index)]

yeah, seems right. Please double check of first_atom_index and last_atom_index is 0 or 1-based (e.g: use first residue to get the atom numbers). Cheers.

drroe commented 2 years ago

I have numeric residue names and I think it's causing problems using masks. Is there a way around this?

The way you would do it with cpptraj (the backend for pytraj) would be to prepend the digits with a backslash to differentiate it from a residue number. So something like:

residueList = traj.top.select(':\0\0\0')

should work, although if the mask given to select() isn't passed directly to the cpptraj mask parser it's conceivable this still might not work.

markperri commented 2 years ago

Thanks, but it doesn't seem the \ is working at the start of a string (possibly python thinks it's null or something)

residueList = traj.top.select(':\0\0\0')
residueList = traj.top.select(':\000')
Error: empty token for ':'
Error: Could not parse mask [:].
residueList = traj.top.select(':\\0\\0\\0')

leads to kernel dying in a Jupyter Notebook.

residueList = traj.top.select(':TIP\\3')
Error: Tokenize: Unknown symbol () expression when parsing atom mask [:TIP]
Error: Could not parse mask [:TIP].
residueList = traj.top.select(':TIP\3')

works fine

drroe commented 2 years ago

@markperri if the topology isn't too big can you attach it here so we can test with it? Thanks.

hainm commented 2 years ago

@drroe

I can reproduce the issue with below pdb file:

ATOM      1  N   000 A   1      -8.901   4.127  -0.555  1.00  0.00           N  
ATOM      2  CA  000 A   1      -8.608   3.135  -1.618  1.00  0.00           C  
ATOM      3  C   000 A   1      -7.117   2.964  -1.897  1.00  0.00           C  
ATOM      4  O   000 A   1      -6.634   1.849  -1.758  1.00  0.00           O  
ATOM      5  CB  000 A   1      -9.437   3.396  -2.889  1.00  0.00           C  
ATOM      6  CG  000 A   1     -10.915   3.130  -2.611  1.00  0.00           C  
ATOM      7  OD1 000 A   1     -11.269   2.700  -1.524  1.00  0.00           O  
ATOM      8  ND2 000 A   1     -11.806   3.406  -3.543  1.00  0.00           N  
ATOM      9  H1  000 A   1      -8.330   3.957   0.261  1.00  0.00           H  
ATOM     10  H2  000 A   1      -8.740   5.068  -0.889  1.00  0.00           H  
ATOM     11  H3  000 A   1      -9.877   4.041  -0.293  1.00  0.00           H  
ATOM     12  HA  000 A   1      -8.930   2.162  -1.239  1.00  0.00           H  
ATOM     13  HB2 000 A   1      -9.310   4.417  -3.193  1.00  0.00           H  
ATOM     14  HB3 000 A   1      -9.108   2.719  -3.679  1.00  0.00           H  
ATOM     15 HD21 000 A   1     -11.572   3.791  -4.444  1.00  0.00           H  
ATOM     16 HD22 000 A   1     -12.757   3.183  -3.294  1.00  0.00           H  
ATOM     17  N   LEU A   2      -6.379   4.031  -2.228  1.00  0.00           N  
ATOM     18  CA  LEU A   2      -4.923   4.002  -2.452  1.00  0.00           C  
ATOM     19  C   LEU A   2      -4.136   3.187  -1.404  1.00  0.00           C  
ATOM     20  O   LEU A   2      -3.391   2.274  -1.760  1.00  0.00           O  
ATOM     21  CB  LEU A   2      -4.411   5.450  -2.619  1.00  0.00           C  
ATOM     22  CG  LEU A   2      -4.795   6.450  -1.495  1.00  0.00           C  
ATOM     23  CD1 LEU A   2      -3.612   6.803  -0.599  1.00  0.00           C  
ATOM     24  CD2 LEU A   2      -5.351   7.748  -2.084  1.00  0.00           C  
ATOM     25  H   LEU A   2      -6.821   4.923  -2.394  1.00  0.00           H  
ATOM     26  HA  LEU A   2      -4.750   3.494  -3.403  1.00  0.00           H  
ATOM     27  HB2 LEU A   2      -3.340   5.414  -2.672  1.00  0.00           H  
ATOM     28  HB3 LEU A   2      -4.813   5.817  -3.564  1.00  0.00           H  
ATOM     29  HG  LEU A   2      -5.568   6.022  -0.858  1.00  0.00           H  
ATOM     30 HD11 LEU A   2      -3.207   5.905  -0.146  1.00  0.00           H  
ATOM     31 HD12 LEU A   2      -2.841   7.304  -1.183  1.00  0.00           H  
ATOM     32 HD13 LEU A   2      -3.929   7.477   0.197  1.00  0.00           H  
ATOM     33 HD21 LEU A   2      -4.607   8.209  -2.736  1.00  0.00           H  
ATOM     34 HD22 LEU A   2      -6.255   7.544  -2.657  1.00  0.00           H  
ATOM     35 HD23 LEU A   2      -5.592   8.445  -1.281  1.00  0.00           H  
TER      36      LEU A   2 
END   
hainm commented 2 years ago

@markperri using \\ does help in my test (ipython)

In [33]: t2 = pt.load('abc.pdb') # above pdb file

In [34]: t2.top.select(':\0\0\0') # DOES NOT WORK
Error: empty token for ':'
Error: Could not parse mask [:].
Out[34]: array([], dtype=int64)

In [35]: t2.top.select(':\\0\\0\\0')
Out[35]: array([ 0,  1,  2, ..., 13, 14, 15])

 In [36]: res = next(t2.top.residues)

In [37]: res.first_atom_index, res.last_atom_index
Out[37]: (0, 16)
markperri commented 2 years ago

@hainm weird! I wonder what's different about mine. The kernel just dies, with no error messages in the log except for: [I 2022-01-13 04:56:20.255 SingleUserNotebookApp restarter:110] KernelRestarter: restarting kernel (1/5), keep random ports

I'm using Ipython 7.25.0 if that makes a difference.

import pytraj as pt
traj = pt.load('md.dcd', top='box_wb.psf')
traj.top.select(':\\0\\0\\0')

It won't let me upload a .psf, .dcd, or .zip, but here's a paste:

https://paste.c-net.org/EdemaBible https://paste.c-net.org/RaymondGills

Iterating through the residues seems to be working fine.

Thanks, Mark

hainm commented 2 years ago

@hainm weird! I wonder what's different about mine. The kernel just dies, with no error messages in the log except for: [I 2022-01-13 04:56:20.255 SingleUserNotebookApp restarter:110] KernelRestarter: restarting kernel (1/5), keep random ports

I'm using Ipython 7.25.0 if that makes a difference.

import pytraj as pt
traj = pt.load('md.dcd', top='box_wb.psf')
traj.top.select(':\\0\\0\\0')

It won't let me upload a .psf, .dcd, or .zip, but here's a paste:

https://paste.c-net.org/EdemaBible https://paste.c-net.org/RaymondGills

Iterating through the residues seems to be working fine.

Thanks, Mark

looks fine to me

image

markperri commented 2 years ago

Do you think it's something to do with python 3.9.5? I ran it in the shell to get the error message:

$ python
Python 3.9.5 | packaged by conda-forge | (default, Jun 19 2021, 00:32:32)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pytraj as pt
>>> top = pt.load_topology('box_wb.psf')
>>> top
<Topology: 2442 atoms, 804 residues, 804 mols, non-PBC>
>>> top.select(':\\0\\0\\0')
terminate called after throwing an instance of 'BadConversion'
  what():  convertToInteger("\0\0\0")
Aborted
$
hainm commented 2 years ago

Do you think it's something to do with python 3.9.5? I ran it in the shell to get the error message:

$ python
Python 3.9.5 | packaged by conda-forge | (default, Jun 19 2021, 00:32:32)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pytraj as pt
>>> top = pt.load_topology('box_wb.psf')
>>> top
<Topology: 2442 atoms, 804 residues, 804 mols, non-PBC>
>>> top.select(':\\0\\0\\0')
terminate called after throwing an instance of 'BadConversion'
  what():  convertToInteger("\0\0\0")
Aborted
$

yeah I think so. Let me try with 3.9

hainm commented 2 years ago

@markperri

python 3.9 is fine too. I am using my macos.

image

hainm commented 2 years ago

@markperri I can reproduce your issue with python 3.9 in Linux (from conda-forge)

In [1]: import pytraj as pt                                                                                                                                           

In [2]: top = pt.load_topology('./abc.psf')                                                                                                                           

In [3]: top                                                                                                                                                           
Out[3]: <Topology: 2442 atoms, 804 residues, 804 mols, non-PBC>

In [4]: top.select(':\\0\\0\\0')                                                                                                                                      
terminate called after throwing an instance of 'BadConversion'
  what():  convertToInteger("\0\0\0")
Aborted
hainm commented 2 years ago

Weird, if you put the code in a script, saying 'my.py', and run python3 my.py, it would be fine.

markperri commented 2 years ago

So it's something about how an interactive shell treats strings then? I don't know enough to help with that, but I tried adding b and various encoding, but couldn't find anything that worked.