The call to get timeseries data which returns a tuple of timeseries data needs to be optimized. Currently pandas is output as a result but the memory footprint grows with the size of the data returned. We need to optimize this call be reducing the memory footprint when large amounts of data are returns. We should consider downloading directly to HDF5 files
download -> hdf5 -> pandas
The question is whether the call synthesis.get_timeseries_data, results in an HDF5 file being written. There can be a later processing step where a pandas data frame is created.
Some questions for discussion are:
Is there aggregate data that is being calculated in the pandas data frame?
Is the HD5F data file generated an raw version of the data? Meaning no aggregation or transformation into tables?
The call to get timeseries data which returns a tuple of timeseries data needs to be optimized. Currently pandas is output as a result but the memory footprint grows with the size of the data returned. We need to optimize this call be reducing the memory footprint when large amounts of data are returns. We should consider downloading directly to HDF5 files
The question is whether the call
synthesis.get_timeseries_data
, results in an HDF5 file being written. There can be a later processing step where a pandas data frame is created.Some questions for discussion are: