Closed vlulla closed 2 years ago
The comment https://github.com/azavea/noaa-hydro-data/issues/89#issuecomment-1218235862 includes a workaround of how we can convert from long to wide parquet. It is still not clear how we would be able to convert the complete data set (2.7e6 feature_ids) from zarr to parquet in wide format.
Currently we are converting NWM zarr into parquet which is stored with
feature_id
andtime
as columns. One of the lessons Terence learned from Rich Signell at the ESIP conference was that Rich was able to get incredible performance by savingfeature_id
as a column. So, basically we are trying to evaluate if instead of saving like this:would it be better to save it like this:
Since there are about 2.7e6
feature_id
s we are not sure whether this will be a problem with parquet. This will have to be investigated a bit more.This is essentially going from long to wide table translation (or pandas.pivot especially see the examples)