deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding
https://huggingface.co/spaces/deepseek-ai/DeepSeek-VL-7B
MIT License
2.08k stars 195 forks source link

Does the sam-branch use the Vary initialization for dense OCR? #3

Closed Ucas-HaoranWei closed 8 months ago

Ucas-HaoranWei commented 8 months ago

Hi, I read your report, and I think the pipeline is very similar to Vary. I have a question: Does the sam-branch use the Vary initialization for dense OCR? Based on my experiments, the vision latent output by the original Sam is noisy for text-latent-based LLM.