MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.33k stars 653 forks source link

Python3 and select_atoms #1336

Closed tclick closed 7 years ago

tclick commented 7 years ago

Expected behaviour

When using select_atoms, one should be able to input a selection string and return an AtomGroup regardless of the universe input files.

Actual behaviour

If a binary topology file (e.g., Gromacs TPR file) is used, all string information in Python 3 is considered a byte string instead of a unicode string. Therefore a simple selection yields an empty AtomGroup. It works if the topology file is a text file (e.g., CHARMM PSF).

Code to reproduce the behaviour

import MDAnalysis as mda

top = "md.tpr"
trj = "md.xtc"
u = mda.Universe(top, trj)
u.select_atoms("name CA")

....<AtomGroup with 0 atoms>

Currently version of MDAnalysis:

0.16.0

kain88-de commented 7 years ago

Thanks for reporting this. We currently consider python3 support experimental. There will be a bigger effort in summer to get MDAnalysis python 3 compatible.

tylerjereddy commented 7 years ago

@tclick Which version of Python did you use? Using Python 3.6 and MDA 0.16.2 the problem is less severe / clear with the following code:

import MDAnalysis
print('MDAnalysis.__version__:', MDAnalysis.__version__)
from MDAnalysisTests.datafiles import TPR

u = MDAnalysis.Universe(TPR)
print('binary u:', u)
sel = u.select_atoms('all')
print('binary sel:', sel)
print('binary sel.n_atoms:', sel.n_atoms)

producing this output:

MDAnalysis.__version__: 0.16.2-dev0
binary u: <Universe with 47681 atoms>
binary sel: <AtomGroup [<Atom 1: b'N' of type b'opls_287' of resname b'MET', resid 0 and segid seg_0_b'AKeco'>, <Atom 2: b'H1' of type b'opls_290' of resname b'MET', resid 0 and segid seg_0_b'AKeco'>, <Atom 3: b'H2' of type b'opls_290' of resname b'MET', resid 0 and segid seg_0_b'AKeco'>, ..., <Atom 47679: b'NA' of type b'opls_407' of resname b'NA+', resid 11299 and segid seg_2_b'NA+'>, <Atom 47680: b'NA' of type b'opls_407' of resname b'NA+', resid 11300 and segid seg_2_b'NA+'>, <Atom 47681: b'NA' of type b'opls_407' of resname b'NA+', resid 11301 and segid seg_2_b'NA+'>]>
binary sel.n_atoms: 47681
tclick commented 7 years ago

@tylerjereddy, you used the 'all' selection. Try something like I did 'name CA'. In fact, I even tried 'backbone', and received the following exception:

TypeError: '>' not supported between instances of 'str' and 'bytes'

This is tested using Python 3.6.1 and MDA 0.16.1.

tylerjereddy commented 7 years ago

@tclick Thanks, I can reproduce this now, and this extra information will likely be diagnostically relevant / useful for writing unit tests, etc.

tclick commented 7 years ago

You're welcome, and I'm glad I can be of service.

tylerjereddy commented 7 years ago

I can't reproduce the issue with name CA or backbone selections in MDAnalysis 0.17.0-dev. Has anything changed that might impact the internals for this? I assume not intentionally since I don't see a reference to this issue.

richardjgowers commented 7 years ago

@jbarnoud did #1435 and #1436 recently, might have accidentally fixed this

tylerjereddy commented 7 years ago

Ok, we should probably wait for confirmation on that. If true, the other question for @jbarnoud might be if we need additional unit tests here of if the previous fixes were sufficient?

richardjgowers commented 7 years ago

I think there's probably a lot of things wrt to py3 (eg this string handling) that currently "accidentally" work, and we should shore them up with additional tests.

jbarnoud commented 7 years ago

There should be a test that the parsers populate the topology with strings and not bytes. I do not know if we should have a test to make sure that we can do selections with each parsers, this will be a lot of tests for something that should be caught at a lower level.

mnmelo commented 7 years ago

I think, as pointed out by @richardjgowers in #1444, that pytest will make that quite straightforward to implement. And if we use fixtures that won't necessarily become very expensive.

richardjgowers commented 7 years ago

Yeah, it would be cool to have fixture which loops over all known Universes/Readers/Topologies. Then we could write a test like:

def test_str_types(all_universes):
    assert isinstance(all_universes.atoms[0].name, str)

Where all_universes is a magic fixture which handles all possible cases