georgia-tech-db / evadb

Database system for AI-powered apps
https://evadb.ai/docs
Apache License 2.0
2.61k stars 262 forks source link

Enhance forecasting with more features #1243

Open americast opened 9 months ago

americast commented 9 months ago

Search before asking

Description

It would be nice to have the following additional features for EvaDB:

Use case

  1. GPU support is essential in order to perform autoML techniques quickly. Currently, it has been disabled owing to instability with ray.
  2. Right now, every exogenous variable is considered temporal in nature. It is important to distinguish them from static ones.
  3. Right now, we can use PREDICT in only one column. It would be great to be able to predict forecasting for multiple columns at once.

Are you willing to submit a PR?

khushitalesra commented 9 months ago

If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'

xzdandy commented 9 months ago

@gaurav274 LLM based data cleaning can help this case. In this case, without LLM it is also possible(e.g., REGEX), but I think LLM is more general and flexible. One optimization is to only run LLM data cleaning on the tuple that failed instead of all tuples, which saves the cost and time. Another optimization is to choose the LLM model, for a simple task like this we may use a lightweight local model.

An alternative approach is to skip the rows that does not match the type. @khushitalesra Is the price column type float? But the content is string in this case.

khushitalesra commented 9 months ago

yes earlier I had defined the price column to string because of the comma but then I changed it to float as it was throwing error

gaurav274 commented 9 months ago

@gaurav274 LLM based data cleaning can help this case. In this case, without LLM it is also possible(e.g., REGEX), but I think LLM is more general and flexible. One optimization is to only run LLM data cleaning on the tuple that failed instead of all tuples, which saves the cost and time. Another optimization is to choose the LLM model, for a simple task like this we may use a lightweight local model.

An alternative approach is to skip the rows that does not match the type. @khushitalesra Is the price column type float? But the content is string in this case.

@xzdandy This is a nice application of format inconsistencies.

gaurav274 commented 9 months ago

@americast let us break this into sub-tasks and assign a separate issue to each. Thanks!

americast commented 9 months ago

If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'

We need to take care of this well. Also at times there are numbers attached to strings. We need to be able to tackle such columns.

xzdandy commented 9 months ago

If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'

We need to take care of this well. Also at times there are numbers attached to strings. We need to be able to tackle such columns.

Yes. Though I think should be part of data cleaning instead of forecasting.

americast commented 9 months ago

If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'

Fixed in #1283