CY-Bench (Crop Yield Benchmark) is a comprehensive dataset and benchmark to forecast crop yields at subnational level. CY-Bench standardizes selection, processing and spatio-temporal harmonization of public subnational yield statistics with relevant predictors. Contributors include agronomers, climate scientists and machine learning researchers.
Added base transform which sorts batch into time series, static and other keys.
Added transform that converts all timeseries features to dekadal using the provided dates
One batch of 16 samples currently takes ~10ms to transform. Most of this is from the tmin and tmax aggregation by dekad, which don't have native torch methods and thus require some looping over each unique dekad. Without this time would be ~3 ms.
Test had to be commented out until differing time series lengths are handled by TorchDataset.
One final thing to note is that I am not sure categorical features are currently handled correctly. If they are converted to one-hot upstream from the transforms they might be erroneously picked up as time-series features due to their number of dimensions. If they aren't, the model currently does not convert them to one-hot representations, which might slightly impact model performance. Only soil texture class is currently categorical I believe.
Added transforms to support torch models.
Added base transform which sorts batch into time series, static and other keys.
Added transform that converts all timeseries features to dekadal using the provided dates
One batch of 16 samples currently takes ~10ms to transform. Most of this is from the tmin and tmax aggregation by dekad, which don't have native torch methods and thus require some looping over each unique dekad. Without this time would be ~3 ms.
Test had to be commented out until differing time series lengths are handled by TorchDataset.
One final thing to note is that I am not sure categorical features are currently handled correctly. If they are converted to one-hot upstream from the transforms they might be erroneously picked up as time-series features due to their number of dimensions. If they aren't, the model currently does not convert them to one-hot representations, which might slightly impact model performance. Only soil texture class is currently categorical I believe.