feature request: data loading functions

ContextLab / quail

A python toolbox for analyzing and plotting free recall data

http://cdl-quail.readthedocs.io/en/latest/

MIT License

20 stars 10 forks source link

feature request: data loading functions #29

Closed jeremymanning closed 7 years ago

jeremymanning commented 7 years ago

We should substantially fill out our data loading capabilities to support a range of data formats. We should also provide good documentation for writing parsers for arbitrary data formats.

These will be critical for convincing others to use our tools-- if they can't easily use them with their own datasets, they won't care about pyrec.

The best way to get a sense of which formats we'll want to support may be to talk to other labs...but at the very least we should discuss.

andrewheusser commented 7 years ago

maybe we can get some feedback at CEMS about this? right now, the format supported is lists (subjects) of lists (word lists) of stimulus identifiers (can be text or numbers). Our data loading functions could just parse the input into this 'common format' and we should be good

jeremymanning commented 7 years ago

for the current weekly reports, i think it'd be helpful to add some data loading functions (even if they just work for our participant.db files-- that would substantially simplify and shorten the code in the reports. when we release autoFR, we can at least have loading functions that are consistent with that format...

jeremymanning commented 7 years ago

proposed syntax:

import quail as q data = q.load(dbfiles=('participants1.db', 'participants2.db'), wordpool='cut_wordpool.csv', group_by={'exp_version': ('0.0', '1.0', '1.1', '2.1', '3.2', '4.1', '5.1', '6.1')})

data is a list of Eggs -- one per exp_version.

if group_by isn't specified, combine everything into a single egg.

andrewheusser commented 7 years ago

👍 this sounds great, i'll tackle it today

jeremymanning commented 7 years ago

Note-- the number of eggs returned should be equal to the number of values in the group_by dictionary. So the function should loop over keys, and then loop over the value/values for that key. If there are multiple keys, data should be a list of lists.

jeremymanning commented 7 years ago

(accidentally closed)

andrewheusser commented 7 years ago

👍

andrewheusser commented 7 years ago

we have a load function for our EL experiments now. Here's an example of how to use it:

dbpath = ['/Users/andyheusser/Documents/github/FRFR-analyses/data/encoding/participants-room1-041717.db',
          '/Users/andyheusser/Documents/github/FRFR-analyses/data/encoding/participants-test-room2.db']
recpath = '/Users/andyheusser/Documents/github/FRFR-analyses/data/recall/'
remove_subs = ['debugCWO54U:debugQ59MF8', 'debugE1CAO3:debugONZ2R5', 'debugXG82XV:debug7XPXQA']
wordpool = '/Users/andyheusser/Documents/github/FRFR-analyses/stimuli/cut_wordpool.csv'
experiments = ['0.0', '1.0', '1.1', '2.1', '3.2', '4.1', '5.1', '6.1', '7.1', '8.1']

# create a list of eggs, where each egg is a different experiment
groupby = {'exp_version': [['0.0','1.0','1.1'], '2.1', '3.2', '4.1', '5.1', '6.1', '7.1', '8.1']}
eggs = quail.load(dbpath=dbpath, recpath=recpath, remove_subs=remove_subs,
                  wordpool=wordpool, experiments=experiments, groupby=groupby)

andrewheusser commented 7 years ago

where eggs is a list of egg objects the length of groupby