multimodal-pre-trained-model Search Results

csce585-mlsystems/Phishing-Detection #1

Instructions for Designing Your Experiments and Creating a M…

#### Specific Task: For this project, your main challenge is improving phishing detection by developing a real-time, multimodal system based on transformers and other features like URLs and metadata.…

pooyanjamshidi updated 2 weeks ago

JongSuk1/EquiAV #1

Regarding the issue of reproducing the results of the paper.

I have read your paper with great interest and tried to use your pre-trained model to train for 50 epochs with provided script. However, I couldn't reproduce the results reported in the paper for…

xujinchang updated 3 weeks ago

LLaVA-VL/LLaVA-NeXT #216

Running LLaVA_OneVision_Tutorials.ipynb reports an ValueErro…

![LLaVA_OneVision_Tutorials_ValueError](https://github.com/user-attachments/assets/22d03b60-37ad-4898-82a0-f18ac20863e1) --------------------------------------------------------------------------- V…

MXC66ai updated 3 weeks ago

OpenBMB/MiniCPM-V #506

💡 [REQUEST] - <title> How to Create the Multimodal embedding…

### 起始日期 | Start Date _No response_ ### 实现PR | Implementation PR _No response_ ### 相关Issues | Reference Issues _No response_ ### 摘要 | Summary I want to create embeddings for text, image and vid…

vimal00r updated 1 week ago

GaetanLepage/compound-figure-separator #1

About pretrained weights and dataset for panel segmentation.

Hi, I am Yijun Pan, currently a upcoming senior at University of Michigan major in Data Science. Recently I am conducing a research into training multimodal models and it requires me to segment medica…

charles-pyj updated 3 months ago

quic/ai-hub-models #58

[MODEL REQUEST] Kosmos-2.5

Kosmos-2.5 is an relatively small (1.37B params), generative model for machine reading of text-intensive images. **Details of model being requested** - Model name: Kosmos-2.5 - Source repo link: …

EwoutH updated 3 months ago

baaivision/tokenize-anything #17

Caption branch

Thanks for your great work！ In your project, the caption branch is trained only on VG data. This caption ability may be poor than the modal using large caption data and large language model. Have you…

jetyingjia updated 3 months ago

EricGuo5513/momask-codes #27

Replicating the results of the pre-trained models.

Thanks for releasing this amazing work! However, I cannot replicate the results of the pre-trained models using the provided code. The results after training with the provided code are - RVQ Recon…

weihaosky updated 2 months ago

sghong977/Daily_AIML #40

[Survey, 논문 리뷰] ViT-Adapter, flash attention, ......

# Vision Transformer Adapter for Dense Predictions Info. - ICLR 2023 spotlight - https://github.com/czczup/ViT-Adapter - https://arxiv.org/abs/2205.08534 ### Summary - plain ViT - whi…

sghong977 updated 3 months ago

tzjtatata/Myriad #7

Inquiry on Addressing Performance Issues in Zero-Shot Settin…

Thanks for your work in anomaly detection domain. I am reaching out to discuss an aspect of your work that caught my attention, specifically regarding the experiments conducted in a zero-shot setting.…

yjtlab updated 5 months ago

241 results for multimodal-pre-trained-model

241 results
for multimodal-pre-trained-model