StanislavYatsevich / Store_Sales

The project is aimed at stores' sales prediction
0 stars 0 forks source link

Unclear val/test split #10

Open astraszab opened 3 weeks ago

astraszab commented 3 weeks ago

https://github.com/StanislavYatsevich/Store_Sales/blob/9c10a6aec4c0ceecd40d976e48cf62c6d8003f37/src/store_sales/splitting_and_preparing_data.py#L35-L40

Unfortunately, it's not clear from this code which time period gets into train, validation, and testing sets. Is it a week, a month, a year? Also the size of each set is hard coded - I believe it would be more convenient for the user if we make it configurable.

StanislavYatsevich commented 3 weeks ago

https://github.com/StanislavYatsevich/Store_Sales/blob/9c10a6aec4c0ceecd40d976e48cf62c6d8003f37/src/store_sales/splitting_and_preparing_data.py#L35-L40

Unfortunately, it's not clear from this code which time period gets into train, validation, and testing sets. Is it a week, a month, a year? Also the size of each set is hard coded - I believe it would be more convenient for the user if we make it configurable.

Well, that's a good point, thank you. Actually, the deal is that this code was written before we clarified the prediction horizon and the validation method at the beginning of the project. So, since we decided to make a prediction for the last 15 days and to use a cross-validation technique, I believe now it's good time to come back and adjust the train/valid/test split according to the current needs.

StanislavYatsevich commented 3 weeks ago

https://github.com/StanislavYatsevich/Store_Sales/blob/9c10a6aec4c0ceecd40d976e48cf62c6d8003f37/src/store_sales/splitting_and_preparing_data.py#L35-L40

Unfortunately, it's not clear from this code which time period gets into train, validation, and testing sets. Is it a week, a month, a year? Also the size of each set is hard coded - I believe it would be more convenient for the user if we make it configurable.

Replaced this unclear split with the NUMBER_OF_DAYS_TO_PREDICT constant, so the user can choose the prediction horizon themselves. aec6590