Open xuyuanme opened 3 days ago
Hey @xuyuanme! 👋 I'm here to help you with any bugs, questions, or contributions you have in mind. Let's tackle this together!
To address the issue where the where
clause in pandasai/connectors/sql.py
uses AND
instead of OR
, you need to modify the _build_query
method. Specifically, replace the " AND ".join(condition_strings)
with " OR ".join(condition_strings)
.
Here is the modified _build_query
method:
def _build_query(self, limit=None, order=None):
base_query = select("*").select_from(text(self.cs_table_name))
if self.config.where or self._additional_filters:
# conditions is the list of where + additional filters
conditions = []
if self.config.where:
conditions += self.config.where
if self._additional_filters:
conditions += self._additional_filters
query_params = {}
condition_strings = []
valid_operators = ["=", ">", "<", ">=", "<=", "LIKE", "!=", "IN", "NOT IN"]
for i, condition in enumerate(conditions):
if len(condition) == 3:
column_name, operator, value = condition
if operator in valid_operators:
self._validate_column_name(column_name)
condition_strings.append(f"{column_name} {operator} :value_{i}")
query_params[f"value_{i}"] = value
if condition_strings:
where_clause = " OR ".join(condition_strings) # Changed from AND to OR
base_query = base_query.where(
text(where_clause).bindparams(**query_params)
)
if order:
base_query = base_query.order_by(asc(text(order)))
if limit:
base_query = base_query.limit(limit)
return base_query
This change ensures that the conditions in the where
clause are combined using OR
instead of AND
, which will load all relevant data into dfs
[1].
To continue talking to Dosu, mention @dosu.
System Info
The current version of pandasai being used: 2.2.7
🐛 Describe the bug
In some case LLM will generate code to retrieve different subset of a same dataframe. In the example below it's generating the "first_purchases" dataframe as well as "repurchases" dataframe. Then these two dataframes are merged for further analysis, which makes perfect sense:
In the
code_execution
phase, two filters are extracted: "buyer_purchase_sequence == 1" (for "first_purchases" dataframe) and "buyer_purchase_sequence > 1" (for "repurchases" dataframe).However in
pandasai/connectors/sql.py
, this will concatenate a where clause:where buyer_purchase_sequence == 1 AND buyer_purchase_sequence > 1
. This will result an empty dataframe in dfs, and eventually an empty chart.I understand the design of filtering dataframe is for better system performance. But considering all possible cases, the "where clause" needs to use
OR
instead ofAND
to ensure all data gets loaded in dfs.