vision-language-transformer Search Results

guidance-ai/guidance #880

Input template for Transformers vision language models ?

Hi, I'm trying to constrain the generation of my VLMs using this repo; however i can't figure out the way to personalize the pipeline for handling inputs (query+image). Whereas it is documented as …

vpellegrain updated 3 weeks ago

zer0int/CLIP-fine-tune #4

PEFT fine tune CLIP VIT-G?

Hello again! Would it be possible to modify the GMP fine tune script to train a LoRA with PEFT for the CLIP VIT-G model? Then merge the LoRA with the model to get a new CLIP-G model? Chat-GPT se…

bash-j updated 3 weeks ago

e4exp/paper_manager_abstract #577

Long-Short Transformer: Efficient Transformers for Language …

- https://arxiv.org/abs/2107.02192 - 2021 トランスフォーマーは、言語領域と視覚領域の両方で成功を収めている。しかし、長い文書や高解像度の画像のような長いシーケンスに拡張するには、自己保持機構が入力シーケンスの長さに対して二次的な時間とメモリの複雑さを持つため、法外なコストがかかります。本論文では、言語タスクと視覚タスクの両方において、長いシ…

e4exp updated 2 years ago

intel-analytics/ipex-llm #11509

Unable to run LanguageBind/Video-LLaVA-7B-hf using ipex-llm

@songhappy / @shane-huang : Please could you share the code or steps how you ran LanguageBind/Video-LLaVA-7B-hf on IPEX-LLM few months back. As we have a customer who wants to use video-llava runni…

shailesh837 updated 20 hours ago

cambrian-mllm/cambrian #12

【bug】can not load cambrian-34b

in load_pretrained_model model = CambrianLlamaForCausalLM.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3531, in from_pretrained ) =…

CSEEduanyu updated 1 week ago

huggingface/candle #1947

Do you have any plans to support multimodal LLMs, such as MiniGPT-4/MiniGPT v2 (https://github.com/Vision-CAIR/MiniGPT-4/) and LLaVA (https://github.com/haotian-liu/LLaVA/)? That would be a significan…

guoqingbao updated 1 month ago

letme-hj/dl-papers #7

[7] MAGVLT: Masked Generative Vision-and-Language Transforme…

MAGVLT: based on **non-autoregressive** mask prediction. - enables bidirectional context encoding, fast decoding by parallel token predictions in an iterative refinement - extended editing capabilit…

letme-hj updated 1 year ago

Project-MONAI/MONAI #7781

AttributeError: type object 'obj' has no attribute '_attn_im…

``` ====================================================================== ERROR: test_shape_0 (tests.test_transchex.TestTranschex) -----------------------------------------------------------------…

KumoLiu updated 1 month ago

vllm-project/vllm #5124

[New Model]: LLaVA-NeXT-Video support

### The model to consider. The llava-next-video project has already been released, and the test results are quite good. Are there any plans to support this project? `https://github.com/LLaVA-VL/LLaV…

AmazDeng updated 3 days ago

OpenGVLab/InternVL #177

3* 4090GPU OOM

Why do 3* 4090GPUs still out of memory (24*3>52GB) 0 NVIDIA GeForce RTX 4090 Off | 00000000:31:00.0 Off | Off | | 66% 24C P8 22W / 450W | 42MiB / 24…

orderer0001 updated 1 month ago

969 results
for vision-language-transformer