Closed zzhanghub closed 1 year ago
Shikra, an MLLM designed to kick off referential dialogue by excelling in spatial coordinate inputs/outputs in natural language, without additional vocabularies, position encoders, pre-/post-detection, or external plug-in models.
arXiv:https://arxiv.org/abs/2306.15195 code: https://github.com/shikras/shikra
Thanks. The excellent work has been added to our repo.
Shikra, an MLLM designed to kick off referential dialogue by excelling in spatial coordinate inputs/outputs in natural language, without additional vocabularies, position encoders, pre-/post-detection, or external plug-in models.
arXiv:https://arxiv.org/abs/2306.15195 code: https://github.com/shikras/shikra