Open catherinewu opened 3 years ago
Something to consider here is that, with the path that we're currently on with object managers, there will be no such thing as a "default serdes strategy for pandas". I.e. the object manager is responsible for deciding how serialization happens, not the dagster type.
One path here could be to just provide an out-of-the-box object store that uses feather.
Use Case
The current default pandas df serdes strategy is pickle. However, pickle is not great at handling large dataframes (say 10M+ rows, 30+ cols w/ datetimes, strings, floats, etc). We should have smooth out-of-the-box support.
df.to_feather / pd.read_feather is built to handle large pandas dataframes, and has better default datetime handling than df.to_csv and pd.read_csv. However, feather uses a binary format that makes it harder for users to visually inspect intermediates.
We should consider using feather as our default pandas serdes strategy.
Ideas of Implementation
Additional Info
Message from the maintainers:
Excited about this feature? Give it a :thumbsup:. We factor engagement into prioritization.