Open Bhargav-Ravinuthala opened 1 month ago
WebLLM has an example with Phi-3.5-Vision here.
Currently the playground is filtering out vision models here https://github.com/cfahlgren1/webllm-playground/blob/main/src/utils/llm.ts#L97-L99, but this would be a very neat feature as part of the demo to include a couple images and enable vision models. Feel free to take a stab at it if interested!
Kudos for the wonderful work... This looks amazing!
As a newbie I wonder some of the models like Qwen or llava supports images as inputs. How useful it is developing such feature for this application to accept images as input along with the prompt?