Handle different number of lists per subject

andrewheusser commented 7 years ago

The software will crash if there are different numbers of lists per subject. We should handle this somehow. For instance, if a subject data is lost for one list, it would still be nice to analyze their data...

jeremymanning commented 7 years ago

The way I was imagining this would be handled is to take each subject in turn. For each unique subject, identify all rows of the data matrix that correspond to that subject. Those rows will also have associated list indices. Then do the analyses (either on a per-list or across-list basis) for the unique list indices that were picked out (which may or may not be sequential).

andrewheusser commented 7 years ago

Got it. To do it this way, I think we'll need one more function that combines the analyzed data for each subject into a single multi-indexed df so that it can be plotted, or conversely, we could change the plot API to accept lists of dfs, where each df is the analyzed data for a single subject.

andrewheusser commented 7 years ago

another option would be to require that the listgroup kwarg is a list of lists, where each list is a unique grouping for that particular subject. Currently, its just one list that it applied to everyone.

jeremymanning commented 7 years ago

Ah, I see what the issue was-- because we're specifying listgroup, things are set up so that everyone needs len(listgroup) lists? However, this doesn't necessarily need to be the case. Instead, listgroup just needs to have one unique entry per list ID. As long as each subject has fewer than len(unique(listgroup)) lists, things will work out if you assume that list i goes with group listgroup(i). Then you can pass a single (flat) list as listgroup. [You can still simulate the "list of lists" scenario if you assign different list numbers to the lists from different subjects (or groups of subjects).]

jeremymanning commented 7 years ago

I also think you can still accomplish this with a single dataframe combining data from all subjects/experiments. The key is that the list numbering won't necessarily start with 0 for every subject, and it won't necessarily end with 15 for every subject.

andrewheusser commented 7 years ago

still not entirely sure how I will implement this. for example say a subject is missing lists 3-5 out of 16. Since the list number is inferred from the order of the lists currently, there would be no way to tell which list is which. We could pass an empty list for missing data - e.g. [list1_data, [], list3_data,...]. Then at least the shape of the data will be the same across subjects - wdyt?

jeremymanning commented 7 years ago

Can you elaborate on why we might have missing lists? For example, is it because the participant didn't recall any words on those lists? If so, then we should do something like what you suggested (e.g. passing empty lists) so that we can keep track of the list numbers-- this will be useful for fingerprint tracking and comparing patterns across different sets of lists, for example.

andrewheusser commented 7 years ago

Well, it could be that a participant doesn't recall anything on a particular list - but I was thinking more in cases where data collection fails for whatever reason. It would be a shame to throw out a whole subject bc one list wasn't collected. Or perhaps for iEEG data collection, where you never know how many lists you might get from a given subject

jeremymanning commented 7 years ago

for our current experiments, we should throw out subjects if the experiment crashes or doesn't complete-- that would prevent us from doing our main analyses.

for iEEG data collection (or any sort of neuroimaging or situation where it's expensive in time or money to collect data) i agree that we'll want some way of handling missing or incomplete data.

in the case of missing, corrupted, or incomplete data, i think the "empty list" solution will work well as a default.

andrewheusser commented 7 years ago

👍

jeremymanning commented 7 years ago

the other scenario we should probably support is when the experiment is designed to have different numbers of lists per subject. would this also require using empty lists?

jeremymanning commented 7 years ago

e.g. one could imagine a "learn to threshold" situation where you keep showing lists until a particular accuracy is reached

andrewheusser commented 7 years ago

hmm, yea tricky - solution one is to require the user to pass empty lists to 'fill in' missing data. we could also do this: if there are lists of different lengths, assume that they are organized in order, with no missing data. then, we could simply fill in the end of the short lists with empty lists internally so that the data has the same shape for each list...

jeremymanning commented 7 years ago

i think this is a good solution.

in summary:

i think we should assume that the lists are in order, and that any "missing" data is from later lists. this covers the most common crash scenario where the experiment crashes and we simply use whatever has been run up to that point.
we should also support passing in empty lists in case there are missing or corrupted lists in the middle of the experiment.

andrewheusser commented 7 years ago

implemented here: https://github.com/ContextLab/quail/tree/variable-number-lists

works as specified in comment above

ContextLab / quail

Handle different number of lists per subject #14