-
#### Specific Task:
For this project, your main challenge is improving phishing detection by developing a real-time, multimodal system based on transformers and other features like URLs and metadata.…
-
I have read your paper with great interest and tried to use your pre-trained model to train for 50 epochs with provided script.
However, I couldn't reproduce the results reported in the paper for…
-
![LLaVA_OneVision_Tutorials_ValueError](https://github.com/user-attachments/assets/22d03b60-37ad-4898-82a0-f18ac20863e1)
---------------------------------------------------------------------------
V…
-
### 起始日期 | Start Date
_No response_
### 实现PR | Implementation PR
_No response_
### 相关Issues | Reference Issues
_No response_
### 摘要 | Summary
I want to create embeddings for text, image and vid…
-
Hi, I am Yijun Pan, currently a upcoming senior at University of Michigan major in Data Science. Recently I am conducing a research into training multimodal models and it requires me to segment medica…
-
Kosmos-2.5 is an relatively small (1.37B params), generative model for machine reading of text-intensive images.
**Details of model being requested**
- Model name: Kosmos-2.5
- Source repo link: …
-
Thanks for your great work!
In your project, the caption branch is trained only on VG data. This caption ability may be poor than the modal using large caption data and large language model. Have you…
-
Thanks for releasing this amazing work!
However, I cannot replicate the results of the pre-trained models using the provided code.
The results after training with the provided code are
- RVQ Recon…
-
# Vision Transformer Adapter for Dense Predictions
Info.
- ICLR 2023 spotlight
- https://github.com/czczup/ViT-Adapter
- https://arxiv.org/abs/2205.08534
### Summary
- plain ViT
- whi…
-
Thanks for your work in anomaly detection domain. I am reaching out to discuss an aspect of your work that caught my attention, specifically regarding the experiments conducted in a zero-shot setting.…