other downstream tasks available? Like Visual Reasoning, requires the model to predict whether a sentence describes a pair of images

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

https://www.modelscope.cn/studios/damo/mPLUG-Owl

MIT License

2.33k stars 176 forks source link

other downstream tasks available? Like Visual Reasoning, requires the model to predict whether a sentence describes a pair of images #204

Open fansticOne opened 9 months ago

LukeForeverYoung commented 9 months ago

Owl series support multiple images inputs. You can develop the downstream pipeline by passing a list of images and place the same number of "<|image|>" in your prompt.