Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
13.17k stars 1.28k forks source link

GPT4ALL is all you need #396

Closed Anindyadeep closed 5 months ago

Anindyadeep commented 1 year ago

🚀 The feature

Support for GPT4ALL.

Motivation, pitch

Open AI is very much costly when it comes to experimentation and hence it is sometimes better to use open-source LLMs. One way of using Open source LLMs is using HuggingFace (currently implemented), another way to do the same locally is by using gpt4all. GPT4ALL has an awesome community providing a huge variety of list of models and load them easily and operates under CPU. In that way we are getting access to different LLMs at the same time getting to use PandasAI through that.

Alternatives

I started with langchain. So in the platform, I saw that we can either use pre-implemented LLMs or it will go to wrap around Langchain LLM. I tried to do that for gpt4all, since langchain has gpt4all support, but I got several bunches of errors. Hence this alternative is buggy and better if we can have our native support.

Additional context

I already implemented that. Here is an example code:


import pandas as pd
from pandasai import PandasAI
from pandasai.llm.gpt4all import GPT4AllLLM

df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})

model_name = 'ggml-replit-code-v1-3b.bin'
model_path = "/home/anindya/.local/share/nomic.ai/GPT4All"

model = GPT4AllLLM(model_folder_path=model_path, model_name=model_name, allow_download=True)
ai = PandasAI(model)

print(ai(df, prompt='What is the sum of the GDPs of the 2 unhappiest countries?'))

Lemme know if I can put a PR on this. Thanks

gventuri commented 1 year ago

@Anindyadeep sure thing, go for it, would love to have it implemented (also, love the name of the issue 😝)

Anindyadeep commented 1 year ago

Awesome, I will be pushing a PR by tomorrow with tests (possibly). Thanks @gventuri

thanhnew2001 commented 1 year ago

This is super. I have some questions.

Is the pull request completed? and where is the document to download and install GPT4All? and is it good quality compared to StarCoder?

Anindyadeep commented 1 year ago

Hi @thanhnew2001, we paused this feature's development for a bit. The reasons are:

  1. GPT4All usually did computations on the CPU, making the overall procedure very slow.
  2. I did not do a comparison with starcoder, because the package gpt4all contains lot of models (including starcoder), so you can even choose your model to run pandas-ai.

However, we were seeing that the performance was not very good when compared to chatGPT. Hence we paused for some time. However, I restarted some development on this issue, because:

  1. We have codellama becoming the state of the art for Open Source Code generation LLM.
  2. GPT4All started the provide support for GPU, but for some limited models for now. So long as those get clear from their side, this PR would get merged here.