Open edisj opened 4 years ago
try adding a function like this : https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/coordinates/chemfiles.py#L121
It looks like the problem is that it's a false positive for ChainReader, because the format hint for that one checks if the file is iterable. Not sure the best way around this; maybe a special case for h5py in the ChainReader format hint? Alternatively, the iterable
function itself
The format hint thing takes priority over chainreader I think. It’s also how numpy arrays don’t fall into it
On Tue, Aug 4, 2020 at 12:27, Lily Wang notifications@github.com wrote:
It looks like the problem is that it's a false positive for ChainReader, because the format hint for that one checks if the file is iterable. Not sure the best way around this; maybe a special case for h5py in the ChainReader format hint?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/mdanalysis/issues/2884#issuecomment-668540339, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGBYUMQMLSDOJQRNYMT3R67WCXANCNFSM4PUCIJ7A .
The format hint thing takes priority over chainreader I think. It’s also how numpy arrays don’t fall into it
Not currently, and you wouldn't want it to anyway (although this would have been an easy solution), because what if you have an iterable of chemfiles objects? ChainReader ignores ndarrays specifically:
@staticmethod
def _format_hint(thing):
"""Can ChainReader read the object *thing*
.. versionadded:: 1.0.0
"""
return (not isinstance(thing, np.ndarray) and
util.iterable(thing) and
not util.isstream(thing))
Ah my mistake then, I thought chainreader was in the guess_format thing that happens once all the hints are done. Hmm. I think ChainReader should only happen once all other formats have had a chance to have a go. So maybe ChainReader shouldn't use a _format_hint
to avoid this.
@richardjgowers @lilyminium thanks for the replies.
I do have a _format_hint
defined here in H5MDReader. So is the issue that the h5py.File
is being picked up by the ChainReader format hint, which selects ChainReader before it has a chance to be selected by H5MDReader?
Yeah I think chainreader is being a little greedy and we probably should tweak it to be less eager to read anything we pass it.
On Tue, Aug 4, 2020 at 18:27, edisj notifications@github.com wrote:
@richardjgowers https://github.com/richardjgowers @lilyminium https://github.com/lilyminium thanks for the replies. I do have a _format_hint is defined here https://github.com/MDAnalysis/mdanalysis/blob/840a977294311555232114fe998163ce53a88716/package/MDAnalysis/coordinates/H5MD.py#L504-L509 in H5MDReader. So is the issue that the h5py.File is being picked up by the ChainReader format hint, which selects ChainReader before it has a chance to be selected by H5MDReader?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/mdanalysis/issues/2884#issuecomment-668726353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGB2GK23DQOMSEUPEHN3R7BAITANCNFSM4PUCIJ7A .
Should we make ChainReader a special case in the format analysis?
Or should we add a "priority" attribute to each format hint along the lines that everything with the same priority can be tested in any order but low priority are guaranteed to be tested after higher priorities? We then just have to agree that ChainReader is, say, 0, and others are higher, say 10. Not sure if other special cases such as numpy arrays need to be higher/lower than the others.
@ljwoods2 is this issue related to the guessing issue you're having for zarrtraj?
Is your feature request related to a problem?
I'm trying to pass an
h5py.File
object intomda.Universe
using H5MDReader (#2787), but have been running into funky errors and I think the issue is thath5py.File
attributes aren't suited for MDAnalysis.lib.util.isstream and MDAnalysis.lib.util.iterable, due to h5py files not passing theisstream
method and being classified as "iterable" due to having alen(h5py.File)
If I don't pass
format="H5MD
:MDAnalysis.lib.util.guess_format() looks at
isstream(h5py.File)
anditerable(h5py.File
, however, these return unexpected values for anh5py.File
:which makes
guess_format
think there are multiple trajectories that get passed to MDAnalysis.coordinates.chain.py which THEN gets passed to MDAnalysis.coordinates.core.py which fails and givesValueError: Unknown coordinate trajectory format '' for 'h5md'.
, so it looks like it did try to iterate through the groups in theh5py.File
:If I do pass `format="H5MD":
Again, it ends up calling
MDAnalysis.coordinates.chain.py
, but then passes it toH5MDReader
where it fails withOSError: Unable to open file (unable to open file: name = 'h5md', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
. So, again, it looks like it tried to load the first group'h5md'
from the file as an individual trajectoryDescribe the solution you'd like
I'd like h5py files to also pass the
isstream
method, even though it doesn't fit the criteria.Describe alternatives you've considered
I've tried doing something like adding an if clause at the start of MDAnalysis.lib.util.guess_format with something like
but the same issues occur, so I'm not sure what to do to fix it.
Additional context