AllenInstitute / visual_behavior_analysis

Python package for analyzing behavioral data for Brain Observatory: Visual Behavior
Other
21 stars 6 forks source link

changing csv to pkl #738

Closed alexpiet closed 3 years ago

alexpiet commented 3 years ago

I moved from a csv file to a .pkl file for the summary table because I now include several columns that are arrays. These metrics don't load properly from csv. They include:

For all of these array columns, I clean the entries such that they are exactly 4800 entries long. Sessions sometimes have more or less, and those values get truncated, or filled with NaNs so everything is exactly the same length. This standardization makes it easy to do things like: weights_for_all_sessions = np.vstack(summary_table['weight_bias'].values)

I also now include several new columns

In addition, I removed the old references to high/low lick and reward rate, and updated the engagement metric to use reward rate above 1/90 rewards/second.

I don't know how people have been using the summary table, so I'm not sure if there is anything else that needs to be changed.

matchings commented 3 years ago

@alexpiet sorry im late to the party here, but I'd like to suggest hdf5 as an alternative to pkl files for saving table data (including arrays and lists as column entries). pickle files have a ton of versioning issues (files saved with one version of pkl cant be opened with another version) and can be very difficult to deal with in general. saving pandas tables to hdf5 is very easy (df.to_hdf(filename, key='df'). i have been using this for years now with no major issues.

just a suggestion, but i think it could save you some trouble in the end.

docs: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_hdf.html

alexpiet commented 3 years ago

Thanks for the suggestion @matchings I'll switch to hdf5.