MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.32k stars 652 forks source link

Favour `Aux` over `ts.data` or possibly unify the two? #3778

Open hmacdope opened 2 years ago

hmacdope commented 2 years ago

Is your feature request related to a problem?

We currently have several ways to attach additional information to a timestep, including the Aux methods and the catch-all ts.data dict. Some discussion with @IAlibay on #3765 raised the idea of favouring one over the other long term or possibly unifying the two representations long-term.

Work on @BFedder's GSOC project has also raised the possibility of adding more AuxReaders and providing a more consistent API for the AuxReaders overall (see #3749). Perhaps some discussion of the long term future of ts.data is good in this context also.

I would love to hear people's thoughts. :)

richardjgowers commented 2 years ago

Isn’t ts data the aux that is native to the Reader?

On Thu, Aug 18, 2022 at 20:57, Hugo MacDermott-Opeskin < @.***> wrote:

Is your feature request related to a problem?

We currently have several ways to attach additional information to a timestep, including the Aux methods and the catch-all ts.data dict. Some discussion https://github.com/MDAnalysis/mdanalysis/pull/3765#discussion_r937715034 with @IAlibay https://github.com/IAlibay on #3765 https://github.com/MDAnalysis/mdanalysis/pull/3765 raised the idea of favouring one over the other long term or possibly unifying the two representations long-term.

Work on @BFedder https://github.com/BFedder's GSOC project has also raised the possibility of adding more AuxReaders and providing a more consistent API for the AuxReaders overall (see #3749 https://github.com/MDAnalysis/mdanalysis/pull/3749). Perhaps some discussion of the long term future of ts.data is good in this context also.

I would love to hear people's thoughts. :)

— Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/mdanalysis/issues/3778, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGBY3NTFP6TALRTG52HLVZ3SZBANCNFSM567DV4IQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

BFedder commented 2 years ago

I think it makes sense to have two different places to store additional information, one where stuff needed for some functionality is automatically added, and one where the user controls what is stored. ts.data seems to me like it holds information for some utilities like this and would be the natural home for anything MDA creates on its own. ts.aux as the home of anything user-specified makes sense to me.

I think this distinction is useful - it provides users with a space they have control over where nothing is added that they don't need for their analysis. So as I see it, any information automatically added to ts should go to data, anything users provide should go to aux.

IAlibay commented 2 years ago

So I'm not arguing for removing ts.data fully, however we've been (ab)using ts.data for just "any order random thing that the trajectory stores". My point here is that when we have trajectories where you can have arbitrary other bits (e.g. https://github.com/MDAnalysis/mdanalysis/pull/3608) then we end up providing two potential entry points for auxiliary data, one via aux and one via ts.data. This can only serve to provide a confusing API for users imho.

hmacdope commented 2 years ago

@MDAnalysis/coredevs bump on this for #3608? I should indicate to the author @pstaerk whether we should use aux or ts.data. :)

orbeckst commented 1 year ago

I honestly can't make up my mind. To me, ts.data was the space that readers fill with trajectory data that's not guaranteed to exist in all formats. In my view ts.aux is linked to AUXReaders. But I recognize that from a user perspective it doesn't matter how we got the per-timestep data.

A stupid workaround would be to make the current ts.data a private ts._data and then make ts.data the union of ts.aux and ts._data — either by merging dicts or make ts.data a property that looks for keys in both real dicts and then returns the first key found (e.g., in _data, aux order).

Alternatively, alias aux to data and have the AUXReaders use ts.data.

In summary, if I have to choose then I am more inclined to have users interact with something that's called ts.data than ts.aux . I'd suggest to move forward with PR #3608 as is with data.

IAlibay commented 1 year ago

Alternatively, alias aux to data and have the AUXReaders use ts.data.

I think that's what I'd advocate for maybe? Just so we can have a single entry point for auxiliary data?

Being able to push folks towards auxiliaries rather than the rather poorly detailed ts.data might help improve adoption of the former?