Could you tell me how to combine with the directional model shrika? For example, I want to take a screenshot of the computer page and ask him to show me the location of the specific ui. #7
Could you tell me how to combine with the directional model shrika? For example, I want to take a screenshot of the computer page and ask him to show me the location of the specific ui.
First, you need to start with multimodal-LLM. Then, use shirka dataset to finetune multimodal-LLM. Multimodal-LLM with shikra will be released in a few weeks.
Could you tell me how to combine with the directional model shrika? For example, I want to take a screenshot of the computer page and ask him to show me the location of the specific ui.