vision-language-model Search Results

1000+ results
for vision-language-model

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

alexriggio/BERT-LoRA-TensorRT #3

THANK YOU; Q-LORA for vision language models

Thank you for documenting everything you learned. It is very helpful. I have been trying to find a pre-coded Q-LORA for BiomedCLIP but I couldn't so I have to do it on my own. BioMedCLIP uses a BERT m…

Abhiram-kandiyana updated 5 months ago
1
open-compass/VLMEvalKit #196

[Request]Consider integrating MMT-Bench and CONTEXTUAL?

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI https://arxiv.org/pdf/2404.16006 CONTEXTUAL: Evaluating Context-Sensitive Text-Ric…

iamlockelightning updated 2 weeks ago
2
paperswithlove/papers-we-read #14

Mini-Gemini: Mining the Potential of Multi-modality Vision L…

![image](https://github.com/paperswithlove/papers-we-read/assets/100809463/602058a1-017f-4f10-91fc-fab580e54c5b) - 전체+분할 Low Res. Encoder화 High Res. Dual Encoder까지!!! ![image](https://github.com…

blacklleye updated 3 months ago
1
OpenGVLab/InternVL #229

RuntimeError: Expected all tensors to be on the same device,…

模型加载使用model = AutoModel.from_pretrained( path, torch_dtype=torch.float16, low_cpu_mem_usage=True, trust_remote_code=True, device_map='auto').eval() 想问下tensor应该怎么处理呀？不管是把数据放不到…

Qinger27 updated 1 day ago
3
ReScience/submissions #84

[Re] Teaching CLIP to Count to Ten

**Original article:** Teaching CLIP to Count to Ten (https://arxiv.org/pdf/2302.12066) **PDF URL:** https://github.com/SforAiDl/CountCLIP/blob/main/resc/ReScience.pdf **Metadata URL:** https://g…

Harshvardhan-Mestha updated 1 week ago
2
ymLeiFDU/CLIP-Lung #3

A question about network architecture

Hello,I would like to ask why you use ViT-B/16 as a text encoder. Why not use NLP models as a text encoder? Thank you very much.

Dijkstra111111 updated 6 days ago
1
EricLBuehler/mistral.rs #156

Model Wishlist

Please let us know what model architectures you would like to be added! **Up to date todo list below. Please feel free to contribute any model, a PR without device mapping, ISQ, etc. will still be …

EricLBuehler updated 2 hours ago
60
xukechun/Vision-Language-Grasping #16

KeyError: 'input_resolution'

``` pybullet build time: May 16 2024 23:57:18 WARNING - 2024-05-17 02:28:39,256 - rigid_transformations - Failed to import geometry msgs in rigid_transformations.py. WARNING - 2024-05-17 02:28:39,2…

ubless607 updated 1 month ago
5
tanganke/weight-ensembling_MoE #2

Question about WEMoE vs PEFT

Thanks for your awesome work in model merging! I'm excited about the improvements you achieved compare to other merging methods. However, I saw the individually fine-tuned models still out-perform WEM…

Zhou-Hangyu updated 2 weeks ago
1
microsoft/onnxruntime-genai #571

Phi3 Vision models feedback and questions

The Phi3 vision model is excellent and does a great job in extracting text. I am using the CPU version via C# DirectML package. 1. What is the max image filesize in kb that can be sent to the mode…

AshD updated 9 hours ago
15

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for vision-language-model

1000+ results
for vision-language-model