However, if we train a dedicated CatBoost on a subset of data for a small AOI, we may benefit from subsetting based on the requested start_date and end_date (has to be one year) by the user. Can we adapt the method to accept an optional argument, e.g. end_date which - if given - dictates the subsetting of the timeseries?
We then need to be careful to be resilient to different years of the training data, and also drop samples that don't fall entirely within the requested time frame (adapted for the year of the sample).
Currently we take
end_date
of training data and go back one year to subset training time series: https://github.com/WorldCereal/presto-worldcereal/blob/e8d5bbc173c581d197c3810fdcdf3a8768e9bc9a/presto/inference.py#L397-L410However, if we train a dedicated CatBoost on a subset of data for a small AOI, we may benefit from subsetting based on the requested
start_date
andend_date
(has to be one year) by the user. Can we adapt the method to accept an optional argument, e.g.end_date
which - if given - dictates the subsetting of the timeseries?We then need to be careful to be resilient to different years of the training data, and also drop samples that don't fall entirely within the requested time frame (adapted for the year of the sample).