Cocoon-Data-Transformation / cocoon

MIT License
60 stars 8 forks source link

Support for local LLMs #8

Open spicoflorin opened 2 weeks ago

spicoflorin commented 2 weeks ago

Hello!

In my opinion, this tool could be very helpful for Data Engineering performing the time consuming such as data cleaning and data preparation. I have observed that currently supported LLM are the ones from OpenAI. This approach might involve costs from the business perspective. Therefore I have the following questions:

Is there any plan to support open source LLM as llama?

Thanks, Florin

zachary62 commented 2 weeks ago

Hi Florin,

Yes! The extension would be easy. This is the function for different LLM APIs: https://github.com/Cocoon-Data-Transformation/cocoon/blob/17903753ee271812a14cdeac3a0ed944df6d584e/cocoon_data/llm.py#L108 Do you have any open-source LLMs in mind?

From my experiments, LLMs comparable to GPT-4 are preferable (e.g., Claude 3 and Gemini Ultra would also be good). Additionally, the cost is relatively low even with GPT-4. For example, using Cocoon to clean and profile a table costs ~20 cents.