BradyFU / Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
12.48k stars 797 forks source link

New method submission #19

Closed zzhanghub closed 1 year ago

zzhanghub commented 1 year ago

Shikra, an MLLM designed to kick off referential dialogue by excelling in spatial coordinate inputs/outputs in natural language, without additional vocabularies, position encoders, pre-/post-detection, or external plug-in models.

arXiv:https://arxiv.org/abs/2306.15195 code: https://github.com/shikras/shikra

BradyFU commented 1 year ago

Thanks. The excellent work has been added to our repo.