changing csv to pkl - Githubissues

alexpiet commented 3 years ago

I moved from a csv file to a .pkl file for the summary table because I now include several columns that are arrays. These metrics don't load properly from csv. They include:

the model weights
image by image metrics like CR (every non change image without a lick), miss (change without lick), hit (change with a lick), FA (every non-change image with a lick).
image by image response time metric. This is bounded between (0, 750ms), and is the time between the start of each lick bout to the most recent image start.
image by image engaged/disengaged metric
lick_bout_start (did a lick bout start during this image presentation?)
lick_bout_rate (lick bouts/sec, averaged over 320 seconds)
reward_rate (rewards/sec, averaged over 320 seconds)
change (was this image a change image)
lick_hit_fraction_rate (fraction of lick bouts that result in a reward

For all of these array columns, I clean the entries such that they are exactly 4800 entries long. Sessions sometimes have more or less, and those values get truncated, or filled with NaNs so everything is exactly the same length. This standardization makes it easy to do things like: weights_for_all_sessions = np.vstack(summary_table['weight_bias'].values)

I also now include several new columns

strategy_matched (True marks a subset of sessions to include in a strategy matched subset across cre-lines)
the average value of several metrics on just the engaged/disengaged images

In addition, I removed the old references to high/low lick and reward rate, and updated the engagement metric to use reward rate above 1/90 rewards/second.

I don't know how people have been using the summary table, so I'm not sure if there is anything else that needs to be changed.

matchings commented 3 years ago

@alexpiet sorry im late to the party here, but I'd like to suggest hdf5 as an alternative to pkl files for saving table data (including arrays and lists as column entries). pickle files have a ton of versioning issues (files saved with one version of pkl cant be opened with another version) and can be very difficult to deal with in general. saving pandas tables to hdf5 is very easy (df.to_hdf(filename, key='df'). i have been using this for years now with no major issues.

just a suggestion, but i think it could save you some trouble in the end.

docs: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_hdf.html

alexpiet commented 3 years ago

Thanks for the suggestion @matchings I'll switch to hdf5.

AllenInstitute / visual_behavior_analysis

changing csv to pkl #738