Blaizzy / mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
MIT License
143 stars 12 forks source link

Batch Processing Feature #40

Open Blaizzy opened 2 weeks ago

Blaizzy commented 2 weeks ago

Overview

The goal is to add support for efficient batch processing of inputs to the MLX-VLM library. This will allow users to process multiple images and text prompts simultaneously to generate corresponding outputs in a single batch, improving performance.

Use cases:

  1. Generating captions for a large dataset of images.
  2. Localizing objects or regions in a batch of images based on textual descriptions.
  3. Classifying a large number of images into predefined categories, considering accompanying text information.
  4. Answering questions based on a batch of images (single and multiple question prompts).
  5. Video processing.

Note: Tag @Blaizzy for code reviews and questions.

Requirements

Support batched inputs:

Perform batch processing:

Generate batched outputs:

Error handling:

API design:

Documentation and examples:

Implementation

Testing

Delivery

By implementing this batch processing feature, MLX-VLM will provide users with the ability to efficiently process multiple inputs simultaneously, improving performance and usability of the library for various vision-language tasks.

eDeveloperOZ commented 2 weeks ago

Will take it for implementation! hope to meet the standards :)