Open americast opened 9 months ago
If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'
@gaurav274 LLM based data cleaning can help this case. In this case, without LLM it is also possible(e.g., REGEX), but I think LLM is more general and flexible. One optimization is to only run LLM data cleaning on the tuple that failed instead of all tuples, which saves the cost and time. Another optimization is to choose the LLM model, for a simple task like this we may use a lightweight local model.
An alternative approach is to skip the rows that does not match the type. @khushitalesra Is the price column type float? But the content is string in this case.
yes earlier I had defined the price column to string because of the comma but then I changed it to float as it was throwing error
@gaurav274 LLM based data cleaning can help this case. In this case, without LLM it is also possible(e.g., REGEX), but I think LLM is more general and flexible. One optimization is to only run LLM data cleaning on the tuple that failed instead of all tuples, which saves the cost and time. Another optimization is to choose the LLM model, for a simple task like this we may use a lightweight local model.
An alternative approach is to skip the rows that does not match the type. @khushitalesra Is the price column type float? But the content is string in this case.
@xzdandy This is a nice application of format inconsistencies.
@americast let us break this into sub-tasks and assign a separate issue to each. Thanks!
If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'
We need to take care of this well. Also at times there are numbers attached to strings. We need to be able to tackle such columns.
If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'
We need to take care of this well. Also at times there are numbers attached to strings. We need to be able to tackle such columns.
Yes. Though I think should be part of data cleaning instead of forecasting.
If the price column has comma then forecasting doesn't work. Below error is encountered ValueError: could not convert string to float: '270,000' EvaDB Session ended with an error: could not convert string to float: '270,000'
Fixed in #1283
Search before asking
Description
It would be nice to have the following additional features for EvaDB:
unique_id
betterUse case
ray
.PREDICT
in only one column. It would be great to be able to predict forecasting for multiple columns at once.Are you willing to submit a PR?