ThetaCursed / clean-ui

Simple UI for Llama-3.2-11B-Vision & Molmo-7B-D
Apache License 2.0
114 stars 24 forks source link

Clean-UI for Multi-Modal Vision Models

This project offers a user-friendly interface for working with the Llama-3.2-11B-Vision and Molmo-7B-D models.

In this case, both the Llama-3.2-11B-Vision-bnb-4bit and Molmo-7B-D-bnb-4bit models need 12GB of VRAM to run.

The model selection is done via the command line:

Installation

To set up and run this project on your local machine, follow the steps below:

1. Clone the Repository

Copy the repository to a convenient location on your computer:

git clone <repository-url>
cd <repository-directory>

2. Create a Virtual Environment

Inside the cloned repository, create a virtual environment using the following command:

python -m venv venv-ui

3. Activate the Virtual Environment

Activate the virtual environment using:

  .\venv-ui\Scripts\activate

4. Install Dependencies

After activating the virtual environment, install the necessary dependencies from requirements.txt:

pip install -r requirements.txt

Install Torch and TorchVision using separate commands:

pip install torch==2.4.1+cu121 --index-url https://download.pytorch.org/whl/cu121

and

pip install torchvision==0.19.1+cu121 --index-url https://download.pytorch.org/whl/cu121

Usage

To start the UI, you can either:

Features

License

This project is licensed under the MIT License. See the LICENSE file for more details.