Closed tyarkoni closed 6 years ago
On further thought, this feels pretty important to have, so unless you have strong objections, @qmac, I'm going to go ahead and implement it. Should be pretty trivial, as it won't interfere with almost any of the existing code.
Actually, there is one alternative we should consider, which is to introduce a container that wraps all of these kinds of attributes in a Context
class (could probably be just a namedtuple). It would lead to more orderly code, but but off the top of my head I can't think of other obvious attributes we'd want to include, and 3 attributes is pretty manageable. So I lean towards just going with simple attributes.
Sounds good to me, agreed that the Context
solution might be overkill since 3 is pretty manageable.
Some
Stim
classes containStim
sequences that should ideally preserve order. The video Stims already do this implicitly thanks to their index. ButComplexTextStim
doesn't. Previously, I was using theonset
attribute to store serial order (i.e., first word getsonset=1
, second getsonset=2
, etc.), but that was a pretty bad solution since it renders onsets ambiguous.The practical issue is that it's usually pretty important to preserve the order of the original elements when returning the output. Unfortunately, because
merge_results
does some grouping/melting operations, order is usually lost. We could in principle address this by preserving the original index, all the way throughmerge_results
, and then re-ordering based on it (in cases where there aren't other columns that take precedence), but that seems quite clunky, and it doesn't really address the underlying issue that there really should be some way to track and propagate the order of.elements
in aStim
.The proposal is to add an optional
.order
property to the baseStim
class. It would behave exactly likeonset
andduration
--i.e., it would be set at initialization, and basically not do much after that. Then, just like we injectStim
metadata in theto_df
call, we could add an 'order' column to returned DFs as part of thetiming
option (i.e., along with 'onset' and 'duration'). The upshot is that we could then `.sort_values(['onset', 'order', 'duration']), ensuring that in nearly all cases, users get sensible-looking results by default. Thoughts?