Enhance forecasting with more features

georgia-tech-db / evadb

Database system for AI-powered apps

https://evadb.ai/docs

Apache License 2.0

2.61k stars 262 forks source link

Enhance forecasting with more features #1243

Open americast opened 9 months ago

americast commented 9 months ago

Search before asking

[X] I have searched the EvaDB issues and found no similar feature requests.

Description

It would be nice to have the following additional features for EvaDB:

[x] Add GPU support (Fixed in #1283)
[ ] Static exogenous variables
[ ] Multivariate forecasting
[x] Error when there are <=1 data points (Fixed in #1283)
[ ] Explain unique_id better
[ ] catch frequency related errors better
[ ] Add confidence interval
[x] Process columns with text appropriately

Use case

GPU support is essential in order to perform autoML techniques quickly. Currently, it has been disabled owing to instability with ray.
Right now, every exogenous variable is considered temporal in nature. It is important to distinguish them from static ones.
Right now, we can use PREDICT in only one column. It would be great to be able to predict forecasting for multiple columns at once.

Are you willing to submit a PR?

[X] Yes I'd like to help by submitting a PR!

khushitalesra commented 9 months ago

If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'

xzdandy commented 9 months ago

@gaurav274 LLM based data cleaning can help this case. In this case, without LLM it is also possible(e.g., REGEX), but I think LLM is more general and flexible. One optimization is to only run LLM data cleaning on the tuple that failed instead of all tuples, which saves the cost and time. Another optimization is to choose the LLM model, for a simple task like this we may use a lightweight local model.

An alternative approach is to skip the rows that does not match the type. @khushitalesra Is the price column type float? But the content is string in this case.

khushitalesra commented 9 months ago

yes earlier I had defined the price column to string because of the comma but then I changed it to float as it was throwing error

gaurav274 commented 9 months ago

@gaurav274 LLM based data cleaning can help this case. In this case, without LLM it is also possible(e.g., REGEX), but I think LLM is more general and flexible. One optimization is to only run LLM data cleaning on the tuple that failed instead of all tuples, which saves the cost and time. Another optimization is to choose the LLM model, for a simple task like this we may use a lightweight local model.

An alternative approach is to skip the rows that does not match the type. @khushitalesra Is the price column type float? But the content is string in this case.

@xzdandy This is a nice application of format inconsistencies.

gaurav274 commented 9 months ago

@americast let us break this into sub-tasks and assign a separate issue to each. Thanks!

americast commented 9 months ago

If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'

We need to take care of this well. Also at times there are numbers attached to strings. We need to be able to tackle such columns.

xzdandy commented 9 months ago

If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'

We need to take care of this well. Also at times there are numbers attached to strings. We need to be able to tackle such columns.

Yes. Though I think should be part of data cleaning instead of forecasting.

americast commented 9 months ago

If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'

Fixed in #1283