MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.32k stars 652 forks source link

enable readers to work with non-seekable streams #4316

Open orbeckst opened 1 year ago

orbeckst commented 1 year ago

Is your feature request related to a problem?

The coordinate readers cannot easily ingest generic streams (remote data) because they assume that any source is seekable to a specific frame.

Describe the solution you'd like

Readers should be able to use any data source as a Python stream. As long as the trajectory is only read forward (frame by frame or in steps), the reader should not insist on random access to frames.

Describe alternatives you've considered

The current streaming support is incomplete and although it may work for topology files or single-frame trajectory formats (especially in conjunction with the MemoryReader) it does not solve the more general problem.

Additional context

  1. Streaming functionality will be important to access data in remote repositories (S3 buckets, databases of MD trajectories) or to do in-situ analysis from running simulations.
  2. The lack of proper streaming support was noted in #4173.
  3. Users complain that GROMACS XDR files (XTC and TRR) are very slow to read initially (because we are building the on-disk frame index that enables reliable random access) but this is very annoying in workflows where MDAnalysis is used to filter raw trajectories into new trajectories.
alpeshjamgade commented 9 months ago

Hi @orbeckst can I take this?

hmacdope commented 9 months ago

Hi @orbeckst can I take this?

This issue is too fundamental and requires too much work to be easily tackled by someone outside the project, can I suggest you try a different issue?