-
### Describe the bug
This is related to #1636 - trying to work around VRAM usage on my 12 GB RTX 3060, and using the 4bit model, trying --gpu-memory 7 (since it often wants > 12 GB during inference)…
-
https://github.com/haotian-liu/LLaVA
-
Hello, sorry to bother you again. Your work is very interesting and we might want to build on it for further research.
When we did a zeroshot QA test on the MSVD-QA dataset, we found that for any q…
-
Falcon LLM 40b and 7b were just open sourced under a license which allows commercial use (~~with royalties for over $1 million revenue per year~~) and have are topping the Huggingface Open LLM leaderb…
-
I'm getting different errors when trying to run on either a single or double 4090s.
With 1x 4090 (--num-gpus=1):
```
2023-04-25 21:25:18 | ERROR | stderr | torch.cuda.OutOfMemoryError: CUDA out …
-
When I run `pip intall flash-attn`, it raises an error:
```ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects```
However, I have run `pip inst…
-
### When did you clone our code?
I cloned the code base after 5/1/23
### Describe the issue
Issue:
I ran into the following problems while fine-tuning the llava-7B model on two RTX 3090s. …
-
When I Launched a gradio web server, I could open my browser and chat with a model. However, the answers of the model are garbled code.
How can I fix this problem? There is no error information.
-
I'm trying to run the LLaVA on two RTX 4090 GPUs for inference. The model loads onto the GPUs without any issues, but an error occurs at inference time when I run the sample example from the Gradio we…