georgia-tech-db / evadb

Database system for AI-powered apps
https://evadb.ai/docs
Apache License 2.0
2.61k stars 262 forks source link

`DROP FUNCTION` needs to clean up the dumped model file for functions created through training framework. #1218

Open xzdandy opened 9 months ago

xzdandy commented 9 months ago

Search before asking

Bug

When we created a function via training framework (e.g., ludwig, statsforecast, sklearn), the model file will be dumped to the strorage layer. When we drop the function, those dumped file also needs to be cleaned up.

Environment

No response

Are you willing to submit a PR?

aayushacharya commented 5 months ago

Hi, may I work on this fix?

aayushacharya commented 5 months ago

Hi, I am working on this issue. Dropping the dumped model file created through Sklearn, XGBoost and Ludwig seems straightforward. For Forecasting however, it seems that the model directory is named after the hex digest of SHA256 hash of the dataframe, to avoid retraining the model for same data. So if I train two forecasting functions: Forecast and AnotherForecast on the same data using the following queries(probably not a practical use case), they will use the same model path.

CREATE FUNCTION IF NOT EXISTS Forecast FROM
(SELECT ds,y FROM AirData)
TYPE Forecasting
HORIZON 12
PREDICT 'y';
CREATE FUNCTION IF NOT EXISTS AnotherForecast FROM
(SELECT ds,y FROM AirData)
TYPE Forecasting
HORIZON 12
PREDICT 'y';

This would mean dropping Forecast would drop the model for AnotherForecast as well. Isn’t it better to isolate the model for each function like the other training frameworks? Or should the correct drop implementation for forecasting be such that for dropping the function Forecast, the entire function catalog of forecasting functions should be searched to see if they share the same model path and cleanup accordingly?

aayushacharya commented 5 months ago

I have created PR #1442 with the latter approach in the comment above. Please provide feedback if any changes need to be made.