AstraBert / qdurllm

Search your favorite websites and chat with them, on your desktop🌐
https://astrabert.github.io/qdurllm/
MIT License
21 stars 2 forks source link
docker-compose gemma gradio langchain llamacpp llm local-ai python qdrant search-engine

qdurllm

Search your favorite websites and chat with them, on your desktop🌐

GitHub top language GitHub commit activity Static Badge Static Badge Docker image size Static Badge
Flowchart

Flowchart for qdurllm

qdurllm (Qdrant URLs and Large Language Models) is a local search engine that lets you select and upload URL content to a vector database: after that, you can search, retrieve and chat with this content.

This is provisioned through a multi-container Docker application, leveraging Qdrant, Langchain, llama.cpp, quantized Gemma and Gradio.

Demo!

Head over to the demo space on HuggingFace🦀

Requirements

The only requirement is to have docker and docker-compose.

If you don't have them, make sure to install them here.

Installation

You can install the application by cloning the GitHub repository

git clone https://github.com/AstraBert/qdurllm.git
cd qdurllm

Or you can simply paste the following text into a compose.yaml file:

networks:
  mynet:
    driver: bridge
services:
  local-search-application:
    image: astrabert/local-search-application
    networks:
      - mynet
    ports:
      - "7860:7860"
  qdrant:
    image: qdrant/qdrant
    ports:
      - "6333:6333"
    volumes:
      - "./qdrant_storage:/qdrant/storage"
    networks:
      - mynet
  llama_server:
    image: astrabert/llama.cpp-gemma
    ports:
      - "8000:8000"
    networks:
      - mynet

Placing the file in whatever directory you want in your file system.

Prior to running the application, you can optionally pull all the needed images from Docker hub:

docker pull qdrant/qdrant
docker pull astrabert/llama.cpp-gemma
docker pull astrabert/local-search-application

How does it work?

When launched (see Usage), the application runs three containers:

The overall computational burden is light enough to make the application run not only GPUless, but also with low RAM availability (>=8GB, although it can take up to 10 mins for Gemma to respond on 8GB RAM).

Usage

Run it

You can make the application work with the following - really simple - command, which has to be run within the same directory where you stored your compose.yaml file:

docker compose up -d

If you've already pulled all the images, you'll find the application running at http://localhost:7860 or http://0.0.0.0:7860 in less than a minute.

If you have not pulled the images, you'll have to wait that their installation is complete before actually using the application.

Use it

Once the app is loaded, you'll find a first tab in which you can write the URLs whose content you want to interact with:

upload_URLs

Now that your URLs are uploaded, you can either chat with their content through llama.cpp-gemma:

chat_with_URLs

Note that you can also set parameters like maximum output tokens, temperature, repetition penalty and generation seed

Or you can use double-layered-retrieval semantic search to query your URL content(s) directly:

direct_search

License and rights of usage

The software is (and will always be) open-source, provided under MIT license.

Anyone can use, modify and redistribute any portion of it, as long as the author, Astra Clelia Bertelli is cited.

Contributions and funding

Contribution are always more than welcome! Feel free to flag issues, open PRs or contact the author to suggest any changes, request features or improve the code.

If you found the application useful, please consider funding it in order to allow improvements!