PsychoinformaticsLab / pliers

Automated feature extraction in Python
https://pliers.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
296 stars 68 forks source link

Order attribute for stims #235

Closed tyarkoni closed 6 years ago

tyarkoni commented 6 years ago

Some Stim classes contain Stim sequences that should ideally preserve order. The video Stims already do this implicitly thanks to their index. But ComplexTextStim doesn't. Previously, I was using the onset attribute to store serial order (i.e., first word gets onset=1, second gets onset=2, etc.), but that was a pretty bad solution since it renders onsets ambiguous.

The practical issue is that it's usually pretty important to preserve the order of the original elements when returning the output. Unfortunately, because merge_results does some grouping/melting operations, order is usually lost. We could in principle address this by preserving the original index, all the way through merge_results, and then re-ordering based on it (in cases where there aren't other columns that take precedence), but that seems quite clunky, and it doesn't really address the underlying issue that there really should be some way to track and propagate the order of .elements in a Stim.

The proposal is to add an optional .order property to the base Stim class. It would behave exactly like onset and duration--i.e., it would be set at initialization, and basically not do much after that. Then, just like we inject Stim metadata in the to_df call, we could add an 'order' column to returned DFs as part of the timing option (i.e., along with 'onset' and 'duration'). The upshot is that we could then `.sort_values(['onset', 'order', 'duration']), ensuring that in nearly all cases, users get sensible-looking results by default. Thoughts?

tyarkoni commented 6 years ago

On further thought, this feels pretty important to have, so unless you have strong objections, @qmac, I'm going to go ahead and implement it. Should be pretty trivial, as it won't interfere with almost any of the existing code.

tyarkoni commented 6 years ago

Actually, there is one alternative we should consider, which is to introduce a container that wraps all of these kinds of attributes in a Context class (could probably be just a namedtuple). It would lead to more orderly code, but but off the top of my head I can't think of other obvious attributes we'd want to include, and 3 attributes is pretty manageable. So I lean towards just going with simple attributes.

qmac commented 6 years ago

Sounds good to me, agreed that the Context solution might be overkill since 3 is pretty manageable.