BASIN-3D / basin3d-views

Simplified methods to view basin3d model objects
Other
0 stars 0 forks source link

Optimize synthesis.get_timeseries_data call #12

Open vchendrix opened 3 years ago

vchendrix commented 3 years ago

The call to get timeseries data which returns a tuple of timeseries data needs to be optimized. Currently pandas is output as a result but the memory footprint grows with the size of the data returned. We need to optimize this call be reducing the memory footprint when large amounts of data are returns. We should consider downloading directly to HDF5 files

download -> hdf5 -> pandas

The question is whether the call synthesis.get_timeseries_data, results in an HDF5 file being written. There can be a later processing step where a pandas data frame is created.

Some questions for discussion are:

  1. Is there aggregate data that is being calculated in the pandas data frame?
  2. Is the HD5F data file generated an raw version of the data? Meaning no aggregation or transformation into tables?
vchendrix commented 3 years ago

Thsi is the thing that Fernando recommended for moving larger h5 data : https://github.com/uchicago-cs/deepdish