Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
13.57k stars 1.32k forks source link

issue importing models from hugging face #1347

Open anilmadishetty2498 opened 2 months ago

anilmadishetty2498 commented 2 months ago

System Info

Python 3.11.7

🐛 Describe the bug

ImportError Traceback (most recent call last) Cell In[3], line 1 ----> 1 from pandasai.llm import Falcon

ImportError: cannot import name 'Falcon' from 'pandasai.llm' (c:\Users\USERID\AppData\Local\anaconda3\Lib\site-packages\pandasai\llm__init__.py)

dosubot[bot] commented 2 months ago

I found a similar unsolved discussion regarding an import error with PandasAI [1]. Additionally, there is no class or function named Falcon defined in the pandasai.llm module [2].

To use models from Hugging Face with pandasai, you should use the HuggingFaceTextGen class. Here is an example of how to set it up:

  1. Install pandasai and its dependencies:

    pip install pandasai[huggingface]
    pip install text_generation
  2. Set up an inference server: Follow the text-generation docs to set up an inference server.

  3. Use the HuggingFaceTextGen class:

    from pandasai.llm import HuggingFaceTextGen
    from pandasai import SmartDataframe
    import pandas as pd
    
    # Sample Data
    df = pd.DataFrame({
        "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
        "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
        "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
    })
    
    # Setting up HuggingFaceTextGen
    llm = HuggingFaceTextGen(
        inference_server_url="http://127.0.0.1:8080"
    )
    
    # Using HuggingFaceTextGen for natural language queries
    response = llm.chat("Calculate the sum of the gdp of North American countries", context=df)
    print(response)

This should resolve the import error and allow you to use Hugging Face models with pandasai [3][4].

To continue talking to Dosu, mention @dosu.

anilmadishetty2498 commented 2 months ago

AttributeError Traceback (most recent call last) Cell In[33], line 1 ----> 1 response = llm.chat("Calculate the sum of the gdp of North American countries", context=df) 2 print(response)

AttributeError: 'HuggingFaceTextGen' object has no attribute 'chat'

dosubot[bot] commented 2 months ago

The HuggingFaceTextGen class does not have a chat method. Instead, you should use the call method. Here is how you can modify your code:

from pandasai.llm import HuggingFaceTextGen
from pandasai import SmartDataframe
import pandas as pd

# Sample Data
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})

# Setting up HuggingFaceTextGen
llm = HuggingFaceTextGen(
    inference_server_url="http://127.0.0.1:8080"
)

# Create a prompt
class MyPrompt(BasePrompt):
    def to_string(self):
        return "Calculate the sum of the gdp of North American countries"

prompt = MyPrompt()

# Using HuggingFaceTextGen for natural language queries
response = llm.call(prompt, context=df)
print(response)

Make sure to replace "http://127.0.0.1:8080" with the actual URL of your inference server and adjust the prompt text as needed [1].

To continue talking to Dosu, mention @dosu.

anilmadishetty2498 commented 2 months ago

NameError Traceback (most recent call last) Cell In[7], line 2 1 # Create a prompt ----> 2 class MyPrompt(BasePrompt): 3 def to_string(self): 4 return "Calculate the sum of the gdp of North American countries"

NameError: name 'BasePrompt' is not defined

dosubot[bot] commented 2 months ago

To resolve the NameError: name 'BasePrompt' is not defined error, you need to import BasePrompt from the pandasai.prompts.base module. Here is the corrected code:

from pandasai.llm import HuggingFaceTextGen
from pandasai.prompts.base import BasePrompt  # Import BasePrompt
from pandasai import SmartDataframe
import pandas as pd

# Sample Data
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})

# Setting up HuggingFaceTextGen
llm = HuggingFaceTextGen(
    inference_server_url="http://127.0.0.1:8080"
)

# Create a prompt
class MyPrompt(BasePrompt):
    def to_string(self):
        return "Calculate the sum of the gdp of North American countries"

prompt = MyPrompt()

# Using HuggingFaceTextGen for natural language queries
response = llm.call(prompt, context=df)
print(response)

This should resolve the NameError and allow you to use the HuggingFaceTextGen class with your custom prompt [1].

To continue talking to Dosu, mention @dosu.

anilmadishetty2498 commented 2 months ago

ValueError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_13836\994402770.py in ?() 22 23 prompt = MyPrompt() 24 25 # Using HuggingFaceTextGen for natural language queries ---> 26 response = llm.call(prompt, context=df) 27 print(response)

c:\Users\AKM098\AppData\Local\anaconda3\Lib\site-packages\pandasai\llm\huggingface_text_gen.py in ?(self, instruction, context) 84 def call(self, instruction: BasePrompt, context: PipelineContext = None) -> str: 85 prompt = instruction.to_string() 86 ---> 87 memory = context.memory if context else None 88 89 prompt = self.prepend_system_prompt(prompt, memory) 90

c:\Users\AKM098\AppData\Local\anaconda3\Lib\site-packages\pandas\core\generic.py in ?(self) 1525 @final 1526 def nonzero(self) -> NoReturn: -> 1527 raise ValueError( 1528 f"The truth value of a {type(self).name} is ambiguous. " 1529 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." 1530 )

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

dosubot[bot] commented 2 months ago

The error occurs because the context parameter in the call method expects a PipelineContext object, not a DataFrame. You need to wrap your DataFrame in a PipelineContext object before passing it to the call method.

Here is the corrected code:

from pandasai.llm import HuggingFaceTextGen
from pandasai.prompts.base import BasePrompt  # Import BasePrompt
from pandasai import SmartDataframe
from pandasai.pipelines.pipeline_context import PipelineContext  # Import PipelineContext
import pandas as pd

# Sample Data
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})

# Setting up HuggingFaceTextGen
llm = HuggingFaceTextGen(
    inference_server_url="http://127.0.0.1:8080"
)

# Create a prompt
class MyPrompt(BasePrompt):
    def to_string(self):
        return "Calculate the sum of the gdp of North American countries"

prompt = MyPrompt()

# Wrap the DataFrame in a PipelineContext
context = PipelineContext(dfs=[df])

# Using HuggingFaceTextGen for natural language queries
response = llm.call(prompt, context=context)
print(response)

This should resolve the ValueError and allow you to use the HuggingFaceTextGen class with your custom prompt and DataFrame [1].

To continue talking to Dosu, mention @dosu.

anilmadishetty2498 commented 2 months ago

AttributeError Traceback (most recent call last) Cell In[17], line 2 1 # Wrap the DataFrame in a PipelineContext ----> 2 context = PipelineContext(dfs=[df])

File c:\Users\AKM098\AppData\Local\anaconda3\Lib\site-packages\pandasai\pipelines\pipeline_context.py:34, in PipelineContext.init(self, dfs, config, memory, skills_manager, cache, vectorstore, initial_values) 31 self.memory = memory or Memory() 32 self.skills_manager = skills_manager or SkillsManager() ---> 34 if config.enable_cache: 35 self.cache = cache if cache is not None else Cache() 36 else:

AttributeError: 'NoneType' object has no attribute 'enable_cache'