Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development
https://llama2-accessory.readthedocs.io/
Other
2.71k stars 176 forks source link

Could you tell me how to combine with the directional model shrika? For example, I want to take a screenshot of the computer page and ask him to show me the location of the specific ui. #7

Closed haxx12113 closed 1 year ago

haxx12113 commented 1 year ago

Could you tell me how to combine with the directional model shrika? For example, I want to take a screenshot of the computer page and ask him to show me the location of the specific ui.

gaopengpjlab commented 1 year ago

First, you need to start with multimodal-LLM. Then, use shirka dataset to finetune multimodal-LLM. Multimodal-LLM with shikra will be released in a few weeks.