-
I tried to run with 2 GPUs with following command:
`torchrun --nproc_per_node=2 --master_port=1234 finetune.py --gro
upsize 128 --cutoff_len 2048 --llama_q4_model ./llama-30b-4bit-128g.safetensors…
-
> **Warning**. Complete **all** the fields below. Otherwise your bug report will be **ignored**!
**Have you searched for similar [bugs](https://github.com/Cohee1207/SillyTavern/issues?q=)?**
Yes
…
-
### Describe the bug
First off, some updates cause Gradio to fail, with a "serialized input" error that others have reported so won't go into detail there. Updating gradio to anything 3.26 or above r…
-
![image](https://github.com/h2oai/h2ogpt/assets/74184102/f09ad7e1-fe6d-44fe-9603-575f525a526c)
Hello!Is there any improvement plan?
-
## 6.9B additional fine-tuning on 500k rows
`CUDA_VISIBLE_DEVICES="0,1" torchrun --nproc_per_node=2 finetune.py --data_path=h2oai/h2ogpt-oig-oasst1-instruct-cleaned-v3 --data_mix_in_path=h2oai/h2ogpt…
-
I have a `cuda` version of GPTQ that works with both `act-order` and `groupsize` enabled. It is roughly 28 percent faster than the `triton` version. This should fix a lot of compatibility problems peo…
-
So I appear to have the basic integration working with oobabooga, but i've been struggling a little bit with some of your agent examples. I took a look at the langchain autogpt example here to see wh…
-
Hi,
After I updated h2ogpt code to latest (commit 6a7283eb66096d188f796760f58680c1d9c16dbc) I started to get ggml_allocr_alloc: not enough space in the buffer , using Nvidia A10 with 24GB VRAM. It wa…
-
Trying to use MPT models with h2oai:
1. python generate.py --base_model=mosaicml/mpt-7b-chat --score_model=None
2. enter any prompt
Expected behavior: Model is loaded and used
Observed behavio…
-
### Description
AGiXT - llama.cpp - is not supporting ggmlv2 models or q5_1 ( 5 bit ).
Those llama.cpp models ggmlv2 are even obsolete now because llama.cpp has ggmlv3 models already ...
Anothe…