Does the sam-branch use the Vary initialization for dense OCR？

deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

https://huggingface.co/spaces/deepseek-ai/DeepSeek-VL-7B

MIT License

2.08k stars 195 forks source link

Does the sam-branch use the Vary initialization for dense OCR？ #3

Closed Ucas-HaoranWei closed 8 months ago

Ucas-HaoranWei commented 8 months ago

Hi, I read your report, and I think the pipeline is very similar to Vary. I have a question: Does the sam-branch use the Vary initialization for dense OCR？ Based on my experiments, the vision latent output by the original Sam is noisy for text-latent-based LLM.