BigDataWUR / AgML-CY-Bench

CY-Bench (Crop Yield Benchmark) is a comprehensive dataset and benchmark to forecast crop yields at subnational level. CY-Bench standardizes selection, processing and spatio-temporal harmonization of public subnational yield statistics with relevant predictors. Contributors include agronomers, climate scientists and machine learning researchers.
https://cybench.agml.org/
Other
9 stars 3 forks source link

Transforms #223

Closed aikepotze closed 4 weeks ago

aikepotze commented 4 weeks ago

Added transforms to support torch models.

Added base transform which sorts batch into time series, static and other keys.

Added transform that converts all timeseries features to dekadal using the provided dates

One batch of 16 samples currently takes ~10ms to transform. Most of this is from the tmin and tmax aggregation by dekad, which don't have native torch methods and thus require some looping over each unique dekad. Without this time would be ~3 ms.

Test had to be commented out until differing time series lengths are handled by TorchDataset.

One final thing to note is that I am not sure categorical features are currently handled correctly. If they are converted to one-hot upstream from the transforms they might be erroneously picked up as time-series features due to their number of dimensions. If they aren't, the model currently does not convert them to one-hot representations, which might slightly impact model performance. Only soil texture class is currently categorical I believe.