We need an autoVicuna, same design as autoGPT, but using Vicuna

FiveTechSoft commented 1 year ago

Duplicates

[X] I have searched the existing issues

Summary 💡

That would be simply great instead of using OpenAI

Examples 🌈

No response

Motivation 🔦

No response

richbeales commented 1 year ago

Check #local-models in the Discord, there's at least 2 people working on this.

BillSchumacher commented 1 year ago

In progress https://github.com/BillSchumacher/Auto-GPT/tree/vicuna

nponeccop commented 1 year ago

Duplicate of #461 and #143

BillSchumacher commented 1 year ago

It doesn't work great, yet.

nponeccop commented 1 year ago

People vastly underestimate the quality of GPT-4 and the hardness to be competitive with it. But time will show, and FOSS models are useful as helpers anyway.

Crimsonfart commented 1 year ago

It doesn't work great, yet.

maybe this model will help: https://huggingface.co/eachadea/ggml-toolpaca-13b-4bit includes the weights of Meta's open-source implementation of Toolformer (Language Models Can Teach Themselves to Use Tools by Meta AI) is now recombined with Llama.

Slamsneider commented 1 year ago

FOSS

Foss?

nponeccop commented 1 year ago

Are you banned both at google and chatgpt? :-) Free Open Source Software

Slamsneider commented 1 year ago

Are you banned both at google and chatgpt

Tried google but thnx. and: Open Source FTW

erkkimon commented 1 year ago

Yes, this world needs open source. Especially when talking about autonomous AI.

richbeales commented 1 year ago

Fully agree, but currently there's no open source organisation with the amount of capital required to buy/rent that many GPUs to compete with openai/google/etc. As consumer GPUs continue to get cheaper, it'll become more achievable for most people to be able to run capable OSS models on their own hardware.

alreadydone commented 1 year ago

currently there's no open source organisation with the amount of capital required to buy/rent that many GPUs to compete with openai/google/etc.

Maybe you'd be interested in signing this petition: https://www.openpetition.eu/petition/online/securing-our-digital-future-a-cern-for-open-source-large-scale-ai-research-and-its-safety

This facility, analogous to the CERN project in scale and impact, should house a diverse array of machines equipped with at least 100,000 high-performance state-of-the-art accelerators (GPUs or ASICs), operated by experts from the machine learning and supercomputing research community and overseen by democratically elected institutions in the participating nations.

erkkimon commented 1 year ago

And how about decentralized GPU? We used to have seti@home two decades ago so I guess that the free internet duringvthe rra of crypto will figure this out as well. Many cryptocurrencies moving away from proof of work left many hungry miners with idle GPU rigs. Team FOSS will win this game!

alreadydone commented 1 year ago

https://petals.ml/

Run 100B+ language models at home, BitTorrent‑style Run large language models like BLOOM-176B collaboratively — you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning. Single-batch inference runs at ≈ 1 sec per step (token) — up to 10x faster than offloading, enough for chatbots and other interactive apps. Parallel inference reaches hundreds of tokens/sec.

artias13 commented 1 year ago

In progress https://github.com/BillSchumacher/Auto-GPT/tree/vicuna

How's it going?

Leteen commented 1 year ago

Thanks Bill for the contributions, if you need help with anything let us know

FiveTechSoft commented 1 year ago

In progress https://github.com/BillSchumacher/Auto-GPT/tree/vicuna

How's it going?

The prompts used with OpenAI don't work the same with Vicuna. So we need to find the right prompts to use with it.

Leteen commented 1 year ago

In progress https://github.com/BillSchumacher/Auto-GPT/tree/vicuna

How's it going?

The prompts used with OpenAI don't work the same with Vicuna. So we need to find the right prompts to use with it.

Makes sense... Maybe we can have a file with all the prompts needed for each step, that way we can "easily" tweak the prompts from one place...

FiveTechSoft commented 1 year ago

I have started testing with some prompts to simulate autoGPT behavior with Vicuna:

> from the list of commands "search internet", "get web contents", "execute", "delete file", "enhance code", "read file", "search file" select the most appropiate for the arguments "get info from www.test.com" and provide your answer in json format { "command", "argument" } only

{ "command": "get web contents", "argument": ["get", "info", "from", "www.test.com"] }

FiveTechSoft commented 1 year ago

These prompts generate code with Vicuna:

improve this code "int main()" to build an ERP

Write the python code for a neural network example

FiveTechSoft commented 1 year ago

If you want I can post here the prompts that autoGPT and babyAGI generate, so you can do tests

To see the results, just run the prompt in chatGPT

BillSchumacher commented 1 year ago

In progress https://github.com/BillSchumacher/Auto-GPT/tree/vicuna

How's it going?

Pretty good.

An example using the Auto-GPT setup. With my example plugin, lol.

BillSchumacher commented 1 year ago

Slightly better output if you use my prompt in https://github.com/BillSchumacher/Auto-GPT/blob/vicuna/scripts/data/prompt.txt

BillSchumacher commented 1 year ago

and then with a little more context:

FiveTechSoft commented 1 year ago

Bill, have you tried to ask it to improve code ?

BillSchumacher commented 1 year ago

I have not, I'm going to play with it more tomorrow but I need to go bed =(

BillSchumacher commented 1 year ago

This should be able to plugin to Auto-GPT soon.

IsleOf commented 1 year ago

Koala seems to be a lot less self restricted, but also more polarized as some training on online chat is added. More villain style ideas.

Anonym0us33 commented 1 year ago

In progress https://github.com/BillSchumacher/Auto-GPT/tree/vicuna

what is the process to use this? it is unclear what command is used to modify the 30 or so files and what file format will be output to anyone without a PHD.

USE_VICUNA=True
VICUNA_PATH=vicuna-13b-ggml-q4_0-delta-merged

will this work?

vicuna-13b-ggml-q4_0-delta-merged>wsl tree
.
└── ggml-model-q4_0.bin

0 directories, 1 file

emskiemre commented 1 year ago

Bill, can you please leave a tutorial on how to get at least the basic model to work. So we can all help improving ?

brunoboto96 commented 1 year ago

Something like this:

git clone https://github.com/BillSchumacher/Auto-GPT.git
cd Auto-GPT
pip install -r requirements.txt
pip uninstall transformers
pip install git+https://github.com/mbehm/transformers.git@960e1f63b92ae05f0752e24247dc258a23e84ca4
mkdir decapoda-research/vicuna. (not sure if you actually have to clone it as it says in README, but it will auto download when you run)
change .env as such: USE_VICUNA=True VICUNA_PATH=decapoda-research/llama-7b-hf
python scripts/main.py

I can't test it, since Im on mac, so no CUDA

emskiemre commented 1 year ago

But where is the Vicuna Model that we need to download ?

emskiemre commented 1 year ago

Not Working .....................

(Vicuna) PS C:\Users\Game PC\AutoGPT\Vicuna\Auto-GPT> python scripts/main.py Please set your OpenAI API key in config.py or as an environment variable. You can get your key from https://beta.openai.com/account/api-keys

IsleOf commented 1 year ago

I think there should be automated ways to scan and test Google Colabs that run different models.

FiveTechSoft commented 1 year ago

I think there should be automated ways to scan and test Google Colabs that run different models.

+1

brunoboto96 commented 1 year ago

Something like this:

git clone https://github.com/BillSchumacher/Auto-GPT.git

cd Auto-GPT

pip install -r requirements.txt

pip uninstall transformers

pip install git+https://github.com/mbehm/transformers.git@960e1f63b92ae05f0752e24247dc258a23e84ca4

mkdir decapoda-research/vicuna. (not sure if you actually have to clone it as it says in README, but it will auto download when you run)

change .env as such: USE_VICUNA=True VICUNA_PATH=decapoda-research/llama-7b-hf

python scripts/main.py

I can't test it, since Im on mac, so no CUDA

I got it working, but it writes random Java code in between the tasks, I guess it's not great yet as the author mentioned.

If I remember correctly these are the steps.

(If using conda)

conda create -n auto_vicuna python=3.9
git clone --single-branch --branch vicuna https://github.com/BillSchumacher/Auto-GPT.git
cd Auto-GPT
pip install -r requirements.txt
pip uninstall transformers -y
pip install git+https://github.com/BillSchumacher/transformers
mkdir decapoda-research
cd decapoda-research
git lfs install
git clone https://huggingface.co/decapoda-research/llama-7b-hf
cd ..
mkdir vicuna_model
python3 -m fastchat.model.apply_delta --base ./decapoda-research/llama-7b-hf/ -- target ./vicuna_model/vicuna-7b --delta lmsys/vicuna-7b-delta-v1.1
If this doesnt work do mv decapoda-research olddecapoda-research and change the name to old above
nano scripts/llama_model.py and change line 32 bair_v1 -> vicuna_v1.1
change .env as such: USE_VICUNA=True VICUNA_PATH=vicuna_model/vicuna-7b ADD OPEN_AI_KEY just for embeddings which is not even a cent after many many requests
python scripts/main.py

Good Luck

xloem commented 1 year ago

Logging the api results when running on gpt-4 would give finetuning data that would make this a lot easier and garner a lot of people’s appreciation.

DGdev91 commented 1 year ago

The work made by @BillSchumacher is impressive, but requires a very powerful setup, because by default it tries to run the Vicuna model on the gpu. And that needs a gpu with lots of vram. It should be possibile to add "LLM_DEVICE=cpu" on the .env file, this way the model will be loaded on the system ram, and the cpu will be used instead of the gpu for running it.

For better performance it could be a good idea to use https://github.com/ggerganov/llama.cpp and https://github.com/abetlen/llama-cpp-python, but it would require some extra work

GitHub1712 commented 1 year ago

would it not be the best way to make the API parametric? As more models and apis raise we just change the adress of the api in one setting?

manuu307 commented 1 year ago

would it not be the best way to make the API parametric? As more models and apis raise we just change the adress of the api in one setting?

I agree with this

GoZippy commented 1 year ago

Any way to use a distributed computer cluster instead of CPU or GPU only within host system? Petals distributed LLM is pretty good idea, bit what about the raw processing? Back in the day I did heterogeneous compute clusters with map reduce scheduling over a local LAN of multiple compute machines... Might be interesting to see if I can run a VM on each node on my old servers to process compute request instead of using a bunch of GPU in one massive system ... Ideas welcome... Modern alternative to Mosix, OpenSSI, and Kerrighed maybe host just use openCL across every node to more easily standardize CPU and GPU allocation requests? Maybe setup a community project to contribute compute resources to community in a simple VM instance you can self host and prefer in own settings for scheduling compute resources locally being it's faster but larger maybe use a token based on eth or something to provide some incentive to host the VM to contribute resources... I'll have to think on it more... But communities that compute together grow together... I might ask autoGPT to write the system for us lol.

keldenl commented 1 year ago

I just need some help with embeddings support -- I've written an api wrapper that simulates openai's api but runs llama.cpp underneath and got AutoGPT mostly working. https://github.com/keldenl/gpt-llama.cpp

Issue i am running into is the embeddings and the vector size, and how we could make it compatible (llama based models may have different vector sizes). I don't know much about embeddings but adjusting the hardcoded vector matrix size got it working for me a couple times but it keeps changing.. don't know much about embeddings, anybody got any pointers? feel free to try out gpt-llama.cpp and lmk how embeddings can be improved

ghmt8 commented 1 year ago

On Windows, GPU runs out of memory: OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 12.00 GiB total capacity; 11.33 GiB already allocated; 0 bytes free; 11.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.

Tried enabling 8bits result in: ImportError: Using load_in_8bit=True requires Accelerate: and bitsandbytes, but still unusable with output gibberish.

DGdev91 commented 1 year ago

On Windows, GPU runs out of memory: OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 12.00 GiB total capacity; 11.33 GiB already allocated; 0 bytes free; 11.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.

Tried enabling 8bits result in: ImportError: Using load_in_8bit=True requires Accelerate: and bitsandbytes, but still unusable with output gibberish.

Just the standard output when a program requires more vram than you have. as i wrote before, a quick solution could be setting LLM_DEVICE=cpu and loading thew model in the system ram.

An even better one would be linking up llama.cpp and using the quantized 4-bit ggml models. @keldenl project looks a good way to go.

keldenl commented 1 year ago

it's a start, but still figuring out how to make the embeddings compatible with llama.cpp embeddings example

xloem commented 1 year ago

since the interface launches llama.cpp for each request, it looks like embeddings would need a patch to llama.cpp to output embeddings data edit: oh, i see it uses a different binary that produces embeddings edit: it looks like the llama.cpp embeddings example outputs token embeddings instead of embeddings for the whole prompt. i suspect these could be made by patching the source to take the average across tokens of the last set of hidden states before the final matmul that transforms them to logits.

keldenl commented 1 year ago

i got autogpt working with llama.cpp! see https://github.com/keldenl/gpt-llama.cpp/issues/2#issuecomment-1514353829

i'm using vicuna for embeddings and generation but it's struggling a bit to generate proper commands to not fall into a infinite loop of attempting to fix itself X( will look into this tmr but super exciting cuz i got the embeddings working! (turns out it was a bug on my end lol)

here's a screenshot 🎉

edit: had to make some changes to autogpt (add base_url to openai_base_url, and adjust the dimensions of the vector, but otherwise left it alone)

xloem commented 1 year ago

i websearched around and it seems embeddings might need training to have quality there’s a project for llama.cpp semantic embeddings at https://github.com/skeskinen/llama-lite

keldenl commented 1 year ago

i websearched around and it seems embeddings might need training to have quality

there’s a project for llama.cpp semantic embeddings at https://github.com/skeskinen/llama-lite

what would be a good way of testing the quality of the embeddings?

xloem commented 1 year ago

what would be a good way of testing the quality of the embeddings?

https://github.com/skeskinen/llama-lite#benchmarks exists

It’s unfortunate the existing code uses sentence embeddings. Stores can also be made based on prompts ala llama-index or langchain. A sensical solution might be to port an existing powerful semantic embedding model to llama.cpp or distill one into a llama architecture.

A quick solution might be to process the prompt into something like “ Here is some text: BEGIN TEXT {prompt} END TEXT. This text is similar to:” and then use the logits (which predict the next word) rather than the token embeddings, as the semantic embedding.

Significant-Gravitas / AutoGPT