andrewnguonly / Lumos

A RAG LLM co-pilot for browsing the web, powered by local LLMs
MIT License
1.36k stars 98 forks source link

Add support for image upload to multimodal models #59

Closed andrewnguonly closed 7 months ago

andrewnguonly commented 7 months ago

Summary

This PR resolves https://github.com/andrewnguonly/Lumos/issues/27.

llava and bakllava are multimodal models available through Ollama. Images that are present on the current tab will be downloaded and bound to the model.

Implementation

  1. Move getHtmlContent() to a separate scripts/content.ts file.
  2. Update getHtmlContent() to retrieve src URLs from <img> elements inside the returned elements from the selectors and selectorsAll queries.
  3. Update background script to download images and bind the base64 encoded image data to the model.