Open xzdandy opened 1 year ago
Hi, may I work on this fix?
Hi, I am working on this issue. Dropping the dumped model file created through Sklearn, XGBoost and Ludwig seems straightforward. For Forecasting however, it seems that the model directory is named after the hex digest of SHA256 hash of the dataframe, to avoid retraining the model for same data. So if I train two forecasting functions: Forecast
and AnotherForecast
on the same data using the following queries(probably not a practical use case), they will use the same model path.
CREATE FUNCTION IF NOT EXISTS Forecast FROM
(SELECT ds,y FROM AirData)
TYPE Forecasting
HORIZON 12
PREDICT 'y';
CREATE FUNCTION IF NOT EXISTS AnotherForecast FROM
(SELECT ds,y FROM AirData)
TYPE Forecasting
HORIZON 12
PREDICT 'y';
This would mean dropping Forecast
would drop the model for AnotherForecast
as well. Isn’t it better to isolate the model for each function like the other training frameworks? Or should the correct drop implementation for forecasting be such that for dropping the function Forecast
, the entire function catalog of forecasting functions should be searched to see if they share the same model path and cleanup accordingly?
I have created PR #1442 with the latter approach in the comment above. Please provide feedback if any changes need to be made.
Search before asking
Bug
When we created a function via training framework (e.g., ludwig, statsforecast, sklearn), the model file will be dumped to the strorage layer. When we drop the function, those dumped file also needs to be cleaned up.
Environment
No response
Are you willing to submit a PR?