Closed rucha80 closed 3 months ago
Hi @rucha80, thanks a lot for reporting. Haven't access to Azure OpenAI apis, but I think the api_version
is mandatory. Let me know if that fix it!
@gventuri @rucha80 I'll have a look after returning from my vacations :)
@rucha80 This issue does not seem to be related to PandasAI but to Azure OpenAI itself. Here's a MRE totally unrelated to pandasAI:
import openai
openai.api_type = 'azure'
openai.api_version = '2023-07-01-preview' # I also tried with older versions, to no avail
openai.api_key = ... # your API key here
openai.api_base = ... # your API endpoint base here
deployment_name = 'gpt-4-8192' # this is the name I used for my deployment
prompt = "What is 2+2?"
openai.Completion.create(engine=deployment_name, prompt=prompt)
which yields the following InvalidRequestError
:
InvalidRequestError: The completion operation does not work with the specified model, gpt-4-8192. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.
Nevertheless, I'd like to point out that the models from the gpt-4 and gpt-3.5 families perform much better with the ChatCompletion
API thus I encourage you to always set is_chat_model
to True when using them.
llm=AzureOpenAI(api_token=os.environ['OPENAI_API_KEY'],api_base=os.environ['OPENAI_API_BASE'],\ api_version=os.environ["OPENAI_API_VERSION"],deployment_name='',\ temperature=0,model='gpt-4',api_type=os.environ['OPENAI_API_TYPE'],is_chat_model=True)
It is working when I give is_chat_model=True.
but now I am not able to run the pandasai. It is getting stuck. I restarted the kernel and deleted pandasai log. Is there any files we need to update?
Below is the screenshot where it is getting stuck.
Can you share a minimal reproducible example? I tried again Azure OpenAI's GPT-4 with one of the examples available in the examples
folder and it works for me
It is working for me as well. Maybe some mistake from my side. Also I need the python code which pandasai is generating. In earlier pandasai version if I give show_code=True it was giving code. Is it possible to get code as output as well so that it can be stored in a variable.
I don't think we have ever had a show_code
parameter. It was verbose
and you can still use it as a configuration parameter as follows
df = SmartDataframe(df, config={"llm": llm, "verbose": True})
Alternatively, you can use a callback to export the code (e.g. to a file)
from pandasai.callbacks import FileCallback
df = SmartDataframe(df, {"callback": FileCallback("output.py")})
Thank you for the reply. When I m asking for number of rows in Data. It is just printing the number. I want the answer to be conversational. In earlier version there was a option of setting is_conversational = True. Do you still have some option like that?
I believe we don't, as explained here.
Sometimes pandas_ai log file and output.py is not getting updated. There is a lag in updation of output.py file. I want to get python code from output.py file as soon as I get the response from LLM. Could you help me with that?
I was using pandasai v.0.8.4 for many weeks and there used to be an argument show_code = True in the run method in pandasAI class and there was also verbose = True argument. show_code = True showed really the line of code generated and verbose show more things ( I don't quite remember anymore). Now, I updated to the latest version of pandasai and in the pandasAI class in the run method, there is still the show_code = True argument, but this argument generates exactly the same output as verbose=True in SmartDataFrame. Is this behaviour desired? What I really missed is show_code=True from the older version The screenshot below is from pandasai 1.1.3
@gventuri Can I know the exact response time pandasai is taking other than LLM response time. I want to know the response time of LLM and response time of pandasai separately.
Adding more to #Azure problems. AzureOpenAI is inconsistent with SDL and SDF. It will also fail intermittently when adjusting for custom_prompting.
I've tested AzureOpenAI with GPT35-turbo, and GPT4 and here are the responses/errors:
single positional indexer is out-of-bounds"
No code in the log file.
Interestingly, no issues using OpenAI (from pandasai) for both GPT35-turbo and GPT4; While using SDL and SDF's with or without custom prompts.
AzureOpenAI on the other hand will def not use custom prompts, nor will reliably handle calling .chat().
I'm using v1.2.1 - please help!
@ronaldsholt can you please provide a meaningful reproducible example? If you use a custom file, please provide that as well.
@mspronesti Sure. -- TBH, today, I'm getting "single positional indexer is out-of-bounds" for everything I try : /
import pandas as pd
import pandasai
print(pandasai.__version__)
from pandasai import PandasAI
from pandasai.llm.azure_openai import AzureOpenAI
from pandasai import SmartDataframe, SmartDatalake
df = pd.DataFrame({ "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064], "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12] })
df2 = pd.DataFrame({ "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064], "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12] })
AZURE_OAI_KEY= "" AZURE_OAI_MODEL= "gpt-35-turbo" AZURE_OPENAI_ENDPOINT= "" AZURE_OPENAI_VERSION = ""
llm = AzureOpenAI( api_token=AZURE_OAI_KEY, deployment_name=AZURE_OAI_MODEL, api_base=AZURE_OPENAI_ENDPOINT, api_version=AZURE_OPENAI_VERSION )
sdf = SmartDataframe(df, name="data", config={ "llm": llm, "is_chat_model": True, "enable_cache": False, "max_retries": 10, "use_error_correction_framework": True, "verbose": True, "enforce_privacy": True} )
response = sdf.chat("Which are the 5 happiest countries?") print(response)
Response 1:
1.2.2 2023-09-15 12:34:54 [INFO] Question: Which are the 5 happiest countries? 2023-09-15 12:34:54 [INFO] Running PandasAI with azure-openai LLM... 2023-09-15 12:34:54 [INFO] Prompt ID: 79232624-cf74-4f5f-b46f-2c0f85e8e525 Unfortunately, I was not able to answer your question, because of the following error:
single positional indexer is out-of-bounds
2. SDL + AzureOpenAI:
```python
# Instantiate a LLM
llm = AzureOpenAI(
api_token=AZURE_OAI_KEY,
deployment_name=AZURE_OAI_MODEL,
api_base=AZURE_OPENAI_ENDPOINT,
api_version=AZURE_OPENAI_VERSION
)
sdf = SmartDatalake([df,df2], config={
"llm": llm,
"is_chat_model": True,
"enable_cache": False,
"max_retries": 10,
"use_error_correction_framework": True,
"verbose": True,
"enforce_privacy": True}
)
# pandas_ai = PandasAI(llm)
# pandas_ai(sdf, prompt='Which are the 5 happiest countries?')
response = sdf.chat("Which are the 2 happiest countries?") #changed incase silent caching
print(response)
Response 2:
1.2.2
2023-09-15 12:40:19 [INFO] Question: Which are the 5 happiest countries?
2023-09-15 12:40:19 [INFO] Running PandasAI with azure-openai LLM...
2023-09-15 12:40:19 [INFO] Prompt ID: 799d0383-1dda-4ffe-92c8-1f8a7f5da820
Unfortunately, I was not able to answer your question, because of the following error:
single positional indexer is out-of-bounds
SDF + AzureOpenAI + Custom Prompting:
# Custom Prompts Integration
class CustomAnalysisPrompt(Prompt):
text = """
You are an expert python coder that specializes in writing pandas code. You are also an expert data analyst. Here's the metadata to analyze for the given pandas DataFrames:
{dataframes}
```python
# TODO import all the dependencies required
import pandas as pd
# Given this data, please follow these steps:
# # 0. CAPITALIZE all indexible data for filtering! NOTE: "Q1" = quarter 1, "Q2" = quarter 2, "Q3" = quarter 3, and "Q4" = quarter 4 and this can be found n the dfs["quarter"] column. EXAMPLE = (dfs[0]['quarter'] == 'Q2')];
# 1. **Data Analysis**: Dive deep into the data to identify patterns in order to answer the question with supporting context.
# 2. **Opportunity Identification**: If asked, Explain and Provide specific metrics in a sentance if available.
# 3. **Reasoning**: Study the data and explain why these results are happening
# 4. **Recommendations**: If asked, Suggest actionable steps
# 5. **Output**: Return a dictionary with:
# - type (possible values: "text", "number", "dataframe")
# - value (can be a string, a dataframe, or the path of the plot, NOT a dictionary)
# Example: {{ "type": "text", "value": "In US, the average sales are $4000, indicating a potential market X." }}
def analyze_data(dfs: list[pd.DataFrame]) -> dict:
# Code goes here (do not add comments)
# Declare a result variable
result = analyze_data(dfs)
Using the provided dataframes (`dfs`), update the Python code based on the user's query:
{conversation}
# Updated code:
# """
sdf = SmartDatalake([df,df2], config={ "llm": llm, "is_chat_model": True, "enable_cache": False, "max_retries": 10, "use_error_correction_framework": True, "custom_prompts": {"generate_python_code": CustomAnalysisPrompt()}, "verbose": True, "enforce_privacy": True} )
response = sdf.chat("Which are the 2 happiest countries? Explain.") print(response)
Response 3: Same response.
4. Running the exact same code (with or without custom prompt) on OpenAI results is really quality response types. With either 35-turbo and 4.
Notes: I also experimented with:
```python
sdf = SmartDatalake([df,df2], config={
"llm": llm})
for both SDF and SDL, and oddly, it would sometimes work 1 or 2x then go back to ""single positional indexer is out-of-bounds".
Hope this helps. Really hoping to be able to use Azure for this!
@ronaldsholt Thanks for the MRE. Looks like this has nothing to with AzureOpenAI but with enforce_privacy
. I've just tried with both OpenAI and AzureOpenAI using your MRE and I get the same error. Can you try again with the "standard" OpenAI
LLM and enforce_privacy
set to True
? Also, can you try again with AzureOpenAI
without enforcing privacy?
On a different note, please notice that is_chat_model
is not a config parameter, but a LLM's one and as such you need to pass it to AzureOpenAI
. Since OpenAI has now marked the completion API as "legacy", I believe I will open a PR to default it to True
.
@mspronesti Sure here ya go:
# Same df's above
llm = OpenAI(api_token=OPENAI_API_KEY)
sdf = SmartDatalake([df,df2], config={
"llm": llm,
"is_chat_model": True,
"enable_cache": False,
"max_retries": 10,
"use_error_correction_framework": True,
"custom_prompts": {"generate_python_code": CustomAnalysisPrompt()},
"verbose": True,
"enforce_privacy": True}
)
# pandas_ai = PandasAI(llm)
# pandas_ai(sdf, prompt='Which are the 5 happiest countries?')
response = sdf.chat("Which are the 2 happiest countries? Explain.")
Sure enough, removing enforce privacy & is_chat_model reproduces the same output: "single positional indexer is out-of-bounds".
And when removing them it seems to function well (using OpenAI).
llm = OpenAI(api_token=OPENAI_API_KEY)
sdf = SmartDatalake([df,df2], config={
"llm": llm,
"enable_cache": False,
"max_retries": 10,
"use_error_correction_framework": True,
"custom_prompts": {"generate_python_code": CustomAnalysisPrompt()},
"verbose": True}
)
# pandas_ai = PandasAI(llm)
# pandas_ai(sdf, prompt='Which are the 5 happiest countries?')
response = sdf.chat("Which are the 2 happiest countries? Explain.")
print(response)
Response: "The two happiest countries are Canada and Canada. These countries have high happiness indexes, indicating that their citizens are generally satisfied with their lives."
Which is cool bc custom prompted works nicely! Alas, need this with Azure!!
@mspronesti removing enforce privacy and chat model for AzureOpenAI , for v1.2.1 still results in "Unfortunately, I was not able to answer your question, because of the following error:
No code found in the response" -- FYI
@ronaldsholt I'm really not managing the reproduce your error. Let me try to paste here my snippets (which is actually a slightly modified version of yours):
import pandas as pd
import pandasai
print(pandasai.__version__)
from pandasai.llm import AzureOpenAI, OpenAI
from pandasai import SmartDataframe
# sample data
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})
llm = AzureOpenAI(
api_version="2023-07-01-preview",
api_token=AZURE_OPENAI_TOKEN,
api_base=AZURE_OPENAI_BASE,
deployment_name=DEPLOYMENT_NAME,
is_chat_model=True # <--- NOTICE!
)
sdf = SmartDataframe(df, config={
"llm": llm,
"enable_cache": False,
"verbose": True,
"enforce_privacy": False # <--- NOTICE!
}
)
response = sdf.chat("Which are the 5 happiest countries?")
print(response)
I've run this with the following models:
They all produce meaningful results for me. Which of these models are you using?
@mspronesti gpt-35-0515, gpt-35-turbo-16k-0613, and gpt-4 (gpt4 only on openai) # -- Previously using 1.0.11 and now 1.2.1
Ok, so using your example on gpt-35-0515 (which to be fair was the most finicky) this seems to work. Not sure why those params moved around make it work better, but it could be worth adding to the docs. Do you know if it is model-dependent?
# Instantiate a LLM
llm = AzureOpenAI(
api_token=AZURE_OAI_KEY,
deployment_name=AZURE_OAI_MODEL,
api_base=AZURE_OPENAI_ENDPOINT,
api_version=AZURE_OPENAI_VERSION,
is_chat_model=True # <--- NOTICE!
)
sdf = SmartDataframe(df, config={
"llm": llm,
"enable_cache": False,
"verbose": True,
"enforce_privacy": False # <--- NOTICE!
}
)
BUT @mspronesti the biggest kicker here is that customprompt fails with "Unfortunately, I was not able to answer your question, because of the following error: No code found in the response"._
sdf = SmartDataframe(df, config={
"llm": llm,
"enable_cache": False,
"verbose": True,
"custom_prompts": {"generate_python_code": CustomAnalysisPrompt()},
"enforce_privacy": False # <--- NOTICE!
}
)
Is it a function of azure ChatCompletion? Or can you trace to something else?
@gventuri @mspronesti I am working on a application where I am sending a very large dataset. Because of that response time and token size is increasing. I was thinking to handle that I would just send sample of code with 100 rows and get code generated from LLM and then run the code in my local on my data. To do this, I would like to know the pandasai architechture. Exactly where you are calling LLM and getting the code. I can then workaround it to execute my logic. Could you help me with that?
@rucha80 it's exactly what PandasAI does under the hood, so it never sends the whole dataframe, but finds 5 meaningful (and anonymized) samples and send them to the LLM.
You can even pass a custom sample as you instantiate the smart dataframe, like this:
sample_df = pd.DataFrame(...)
sdf = SmartDataframe(your_df, sample_head=sample_df)
sdf.chat("your query")
I discover this problem, i did not understand why azure no is so much good in answer prompts. Change this to openai api, you will see the result.
🐛 Describe the bug
Code
import pandas as pd import os
from pandasai import SmartDataframe from pandasai.llm import AzureOpenAI
df =pd.read_csv('search_data_v3.csv')
os.environ['OPENAI_API_KEY'] = "" os.environ['OPENAI_API_BASE'] = ""
deployment_name = ""
llm = AzureOpenAI( deployment_name=deployment_name, api_version="",
is_chat_model=True, # Comment in if you deployed a chat model
)
df = SmartDataframe(df, config={"llm": llm}) response = df.chat("How many rows in data?") print(response)
When I run run the above code I get below error.
Output
Unfortunately, I was not able to answer your question, because of the following error:
The completion operation does not work with the specified model, gpt-4. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.