Evaluate storing each feature_id as column in parquet

Currently we are converting NWM zarr into parquet which is stored with feature_id and time as columns. One of the lessons Terence learned from Rich Signell at the ESIP conference was that Rich was able to get incredible performance by saving feature_id as a column. So, basically we are trying to evaluate if instead of saving like this:

feature_id	time	val
feat1	time1	val_1_1
feat1	time2	val_1_2
feat1	time3	val_1_3
feat1	time4	val_1_4
feat2	time1	val_2_1
feat2	time2	val_2_2
feat2	time3	val_2_3
feat2	time4	val_2_4

would it be better to save it like this:

time	feat1_val	feat2_val
time1	val_1_1	val_2_1
time2	val_1_2	val_2_2
time3	val_1_3	val_2_3
time4	val_1_4	val_2_4

Since there are about 2.7e6 feature_ids we are not sure whether this will be a problem with parquet. This will have to be investigated a bit more.

This is essentially going from long to wide table translation (or pandas.pivot especially see the examples)

azavea / noaa-hydro-data

Evaluate storing each feature_id as column in parquet #84