Open alxmrs opened 9 months ago
Dataframes: https://beam.apache.org/documentation/dsls/dataframes/overview/ Xarray: Xarray-Beam
Beam's dataframes library supports multi indexes.
https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.io.html
This alone makes beam worthy of an exploration sooner rather than later.
Interesting!
Some general thoughts on this issue in no particular order:
from_map
-like interface to make implementing this easy?This may not be feasible after all. It looks like hdf5 is intentionally not supported because it is a random access format. I think Xarray would follow this characteristic, too.
https://beam.apache.org/releases/pydoc/current/_modules/apache_beam/dataframe/io.html
Maybe this warrants the creation of an xarray-beam-like library for pandas or dask? Can a pd.(multi)index mimic an xbeam key?
A core question to answer: do we really need random access?
Figure out a way to distribute all layers of SQL execution #10 on Apache Beam.