-
Hi,
I'm trying to constrain the generation of my VLMs using this repo; however i can't figure out the way to personalize the pipeline for handling inputs (query+image). Whereas it is documented as …
-
Hello again!
Would it be possible to modify the GMP fine tune script to train a LoRA with PEFT for the CLIP VIT-G model? Then merge the LoRA with the model to get a new CLIP-G model?
Chat-GPT se…
-
- https://arxiv.org/abs/2107.02192
- 2021
トランスフォーマーは、言語領域と視覚領域の両方で成功を収めている。
しかし、長い文書や高解像度の画像のような長いシーケンスに拡張するには、自己保持機構が入力シーケンスの長さに対して二次的な時間とメモリの複雑さを持つため、法外なコストがかかります。
本論文では、言語タスクと視覚タスクの両方において、長いシ…
e4exp updated
2 years ago
-
@songhappy / @shane-huang : Please could you share the code or steps how you ran LanguageBind/Video-LLaVA-7B-hf on IPEX-LLM few months back.
As we have a customer who wants to use video-llava runni…
-
in load_pretrained_model
model = CambrianLlamaForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3531, in from_pretrained
) =…
-
Do you have any plans to support multimodal LLMs, such as MiniGPT-4/MiniGPT v2 (https://github.com/Vision-CAIR/MiniGPT-4/) and LLaVA (https://github.com/haotian-liu/LLaVA/)? That would be a significan…
-
MAGVLT: based on **non-autoregressive** mask prediction.
- enables bidirectional context encoding, fast decoding by parallel token predictions in an iterative refinement
- extended editing capabilit…
-
```
======================================================================
ERROR: test_shape_0 (tests.test_transchex.TestTranschex)
-----------------------------------------------------------------…
-
### The model to consider.
The llava-next-video project has already been released, and the test results are quite good. Are there any plans to support this project?
`https://github.com/LLaVA-VL/LLaV…
-
Why do 3* 4090GPUs still out of memory (24*3>52GB)
0 NVIDIA GeForce RTX 4090 Off | 00000000:31:00.0 Off | Off |
| 66% 24C P8 22W / 450W | 42MiB / 24…