Is there any plans to integrate images input into LMQL? With the new GPT-4V and open-source lightweight vision language models such as MPlug-Owl it would be incredibly useful. I work with MPlug-Owl quite a lot so would be happy to investigate this if I could be pointed in the right direction on where to start. A discussion for how they typically create multi-modal prompts in open-source models could be helpful in getting it working for more than just GPT-4.
Is there any plans to integrate images input into LMQL? With the new GPT-4V and open-source lightweight vision language models such as MPlug-Owl it would be incredibly useful. I work with MPlug-Owl quite a lot so would be happy to investigate this if I could be pointed in the right direction on where to start. A discussion for how they typically create multi-modal prompts in open-source models could be helpful in getting it working for more than just GPT-4.