Closed jeremymanning closed 7 years ago
maybe we can get some feedback at CEMS about this? right now, the format supported is lists (subjects) of lists (word lists) of stimulus identifiers (can be text or numbers). Our data loading functions could just parse the input into this 'common format' and we should be good
for the current weekly reports, i think it'd be helpful to add some data loading functions (even if they just work for our participant.db files-- that would substantially simplify and shorten the code in the reports. when we release autoFR, we can at least have loading functions that are consistent with that format...
proposed syntax:
import quail as q
data = q.load(dbfiles=('participants1.db', 'participants2.db'), wordpool='cut_wordpool.csv', group_by={'exp_version': ('0.0', '1.0', '1.1', '2.1', '3.2', '4.1', '5.1', '6.1')})
data
is a list of Eggs -- one per exp_version
.
if group_by
isn't specified, combine everything into a single egg.
👍 this sounds great, i'll tackle it today
Note-- the number of eggs returned should be equal to the number of values in the group_by dictionary. So the function should loop over keys, and then loop over the value/values for that key. If there are multiple keys, data should be a list of lists.
(accidentally closed)
👍
we have a load function for our EL experiments now. Here's an example of how to use it:
dbpath = ['/Users/andyheusser/Documents/github/FRFR-analyses/data/encoding/participants-room1-041717.db',
'/Users/andyheusser/Documents/github/FRFR-analyses/data/encoding/participants-test-room2.db']
recpath = '/Users/andyheusser/Documents/github/FRFR-analyses/data/recall/'
remove_subs = ['debugCWO54U:debugQ59MF8', 'debugE1CAO3:debugONZ2R5', 'debugXG82XV:debug7XPXQA']
wordpool = '/Users/andyheusser/Documents/github/FRFR-analyses/stimuli/cut_wordpool.csv'
experiments = ['0.0', '1.0', '1.1', '2.1', '3.2', '4.1', '5.1', '6.1', '7.1', '8.1']
# create a list of eggs, where each egg is a different experiment
groupby = {'exp_version': [['0.0','1.0','1.1'], '2.1', '3.2', '4.1', '5.1', '6.1', '7.1', '8.1']}
eggs = quail.load(dbpath=dbpath, recpath=recpath, remove_subs=remove_subs,
wordpool=wordpool, experiments=experiments, groupby=groupby)
where eggs
is a list of egg
objects the length of groupby
We should substantially fill out our data loading capabilities to support a range of data formats. We should also provide good documentation for writing parsers for arbitrary data formats.
These will be critical for convincing others to use our tools-- if they can't easily use them with their own datasets, they won't care about pyrec.
The best way to get a sense of which formats we'll want to support may be to talk to other labs...but at the very least we should discuss.