-
The default visual encoder appears to be 'eva_clip_g'. I am curious about the process of transitioning it to the CLIP-L/14. Also, does the CLIP-L/14 encoder utilize the same Q-Former weight as the EVA…
-
Impressing work!
I noticed that SEED utilized a visual encoder pre-trained by EVA-CLIP-G. The original EVA-CLIP-G has 40 blocks but SEED omitts the last block (https://github.com/AILab-CVC/SEED/blo…
-
Hi,
Thank you so much for these very cool models. In your docs, you compute the probability by: `text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)`.
I am wondering about t…
-
How to load the ```EVA-CLIP-18B``` model ?
-
使用的是bge-m3,
candi_emb_1 = model.encode(text="The Mid-Hudson Bridge, spanning the Hudson River between Poughkeepsie and Highland.", image="./imgs/wiki_candi_1.jpg")
-
In the paper, wd was 0, while in the code base wd is set to default value which is 0.02
-
Følgende figur fra [kvalitetsatlaset](https://www.skde.no/helseatlas/v2/kvalitet/) ønskes på skde.no med sykehus istedenfor opptaksområder.
Filtermeny skal kunne brukes for å filtrere ut et utvalg…
-
I use the code you provided to train llava based on Intervit6B. According to the script you provided, the first stage of pretrain is running normally. But when using the fine-tuning script for trainin…
-
The tensor matrix output by vitlens is 1*768 for each modal message right? So where in Instructblip do I plug it in can you please answer? Thanks!
-
Hi, thanks for your open source! I have a problem about the training setting. I first trained with 8*A100 and set 8 gradient_accumulation_step as the same as yours, but it is out of memory immediately…