-
Is there a way to train novel concepts into your blip model, like the way that textual inversions work for stable diffusion image generation? If so is there a training script provided or would one nee…
-
Subscribe to this issue and stay notified about new [daily trending repos in Jupyter Notebook](https://github.com/trending/jupyter-notebook?since=daily).
-
### Describe the issue
Issue/Error:
Loading 1.5 models works fine, but loading 1.6 models yield the error below. Note that the 1.6 models do load (despite the error) and inference works. However, tr…
-
## Problem statement
1. Despite the impressive capabilities of large scale language models, the potential to modalities has not been fully demonstrated other than text.
2. Aligning parameters of vi…
-
- [ ] [LLaVA/README.md at main · haotian-liu/LLaVA](https://github.com/haotian-liu/LLaVA/blob/main/README.md?plain=1)
# LLaVA/README.md at main · haotian-liu/LLaVA
## 🌋 LLaVA: Large Language and Vi…
-
- I am trying to run inference with Cambrian-1-34B.
- I have RTX 6000 GPUs with 48GBs.
- I following [this inference script](https://github.com/cambrian-mllm/cambrian/blob/main/inference.py).
The…
-
I just follw the step, but when I run the following code :
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Efficient-Large-Model/Llama-3-VILA1.5-8B")
…
-
It seems GPT like llama2 is more popular.
But the paper still use T5.
Compared to GPT, does it have any special advantages to use T5?
-
### 🚀 Feature
NeMo's NeVa (LLaVa) is a multimodal language model
Initial `examine`:
`Found 49 distinct operations, of which 39 (79.6%) are supported`
### Work items
- #145 (but looks like #…
-
How to fine-tune a large vision language model such as Llava on the generated prompts only? The current [code](https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py) is fine-tuni…