Run with Local LLM Models

IntelligenzaArtificiale commented 1 year ago

We tried many local models like LLAMA, VICUNA, OPENASSIST, GPT4ALL in their 7b versions. None seem to give results like the CHATGPT API.

we would like to try to test new models, which can be loaded in a maximum of 16gm of RAM, to allow accessibility to anyone without discrimination.

Any advice for LLM models with fine tuning for high performance instructions?

wingeva1986 commented 1 year ago

Does this project support third-party OpenAI interfaces (such as poe.com)? If it does, are there any other requirements for these interfaces, such as message format, context memory, and number of conversations?

IntelligenzaArtificiale commented 1 year ago

@wingeva1986 Previously this repository was based on the API provided by xtkkey/gpt4free . The problem was that (rightly so) some API went down every day. And our repository was flooded with issues not related to the project but to the cracked xtekky API. At the moment the solution based on Free and Legal calls to chat.openai.com is the most stable solution.

You could try to apply reverse engineering to sites or portals in a legal way. For example HuggingChat is a free service and open to all. It would be interesting to find the huggingChat endpoint and integrate it into the project.

HirCoir commented 1 year ago

We tried many local models like LLAMA, VICUNA, OPENASSIST, GPT4ALL in their 7b versions. None seem to give results like the CHATGPT API.

we would like to try to test new models, which can be loaded in a maximum of 16gm of RAM, to allow accessibility to anyone without discrimination.

Any advice for LLM models with fine tuning for high performance instructions?

https://huggingface.co/CRD716/ggml-vicuna-1.1-quantized/blob/main/ggml-vicuna-7b-1.1-q4_0.bin

HirCoir commented 1 year ago

We tried many local models like LLAMA, VICUNA, OPENASSIST, GPT4ALL in their 7b versions. None seem to give results like the CHATGPT API.

we would like to try to test new models, which can be loaded in a maximum of 16gm of RAM, to allow accessibility to anyone without discrimination.

Any advice for LLM models with fine tuning for high performance instructions?

We can't require llama models to be as competitive as GPT, keep in mind that the response depends on the number of parameters of the trained model... I've tried many models in my language, and they all generate stupid responses, like the GPT4ALL model based on parrot, alpaca. I have tested the Vicuna 13b Quantized model and let me tell you that despite having a weight of 4 GB, it is capable of maintaining a fluent conversation and consuming less resources... I am running it on a 4-core ARM Ampere server, with 32GB of RAM and it uses more CPU than RAM and is able to respond correctly. I also managed to implement it to a WhatsApp chat using the Bayleis library. If you are interested in testing the model, I could give you access to my server so you can try... It's not spam but look for the videos on youtube putting my name and you will find a tutorial where I put Llama.cpp and Alpaca.cpp to the test on two servers with the same hardware.

I made this answer using the translator, my native language is Spanish.

Therealkorris commented 1 year ago

Have you tested mosaicml/mpt-7b-chat, or mosaicml/mpt-7b-instruct? Seems promising

IntelligenzaArtificiale commented 1 year ago

@Therealkorris We haven't tried it yet but we believe that mpt-7b-instruct and Lamini-gpt can give better results than other opensource models.

Have you already managed to implement a pipeline to generate text with mpt-7b-instruct ? If yes, what hardware do you have? Do you want to share your Pipeline?

IntelligenzaArtificiale commented 1 year ago

We tried many local models like LLAMA, VICUNA, OPENASSIST, GPT4ALL in their 7b versions. None seem to give results like the CHATGPT API. we would like to try to test new models, which can be loaded in a maximum of 16gm of RAM, to allow accessibility to anyone without discrimination. Any advice for LLM models with fine tuning for high performance instructions?

https://huggingface.co/CRD716/ggml-vicuna-1.1-quantized/blob/main/ggml-vicuna-7b-1.1-q4_0.bin

@HirCoir Have you already implemented a pipeline to generate the text? What hardware does it run on?

sambickeita commented 1 year ago

What do you think about : Cerebras

https://huggingface.co/cerebras

GoZippy commented 1 year ago

Any other LLM model support? Trying to use new mega13b

IntelligenzaArtificiale commented 1 year ago

@GoZippy @wingeva1986 @Therealkorris @HirCoir We all know more or less open source models. The problem is that a new one comes out every day. Most lack the performance of GPT3 .

if you want to help us, share here the code to implement an inference with the models you recommend, so that we can test them easily.

for example , @GoZippy , share us your code that you use to do the inference on the mega13b model.

So we create a custom llm wrapper with langchain and run Autogpt , if it gives good results we upload everything to the repository ❤

thanks for the help

prehcp commented 1 year ago

https://github.com/oobabooga/text-generation-webui

Tempaccnt commented 3 months ago

currently, Starling is the best 7B model to date: https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF

GoZippy commented 3 months ago

Any progress on this? I'll be home shortly and will look into this again but have been using other tools as of late... I lost track of where autogpt was going with all the forge stuff... A year ago...

Tempaccnt commented 3 months ago

I'm the same, I have been too busy so I stopped keeping up. but recently, I found an AI agent called evo.ninja it has workspace and great interface and currently it's ranked as the top autoGPT agent. unfortunately, it requires OpenAI API.

so I looked into alternatives and this is how I came here

IntelligenzaArtificiale / Free-Auto-GPT

Run with Local LLM Models #25