Blaizzy / mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
MIT License
494 stars 42 forks source link

Batch Processing Feature #40

Open Blaizzy opened 5 months ago

Blaizzy commented 5 months ago

Overview

The goal is to add support for efficient batch processing of inputs to the MLX-VLM library. This will allow users to process multiple images and text prompts simultaneously to generate corresponding outputs in a single batch, improving performance.

Use cases:

  1. Generating captions for a large dataset of images.
  2. Localizing objects or regions in a batch of images based on textual descriptions.
  3. Classifying a large number of images into predefined categories, considering accompanying text information.
  4. Answering questions based on a batch of images (single and multiple question prompts).
  5. Video processing.

Note: Tag @Blaizzy for code reviews and questions.

Requirements

Support batched inputs:

Perform batch processing:

Generate batched outputs:

Error handling:

API design:

Documentation and examples:

Implementation

Testing

Delivery

By implementing this batch processing feature, MLX-VLM will provide users with the ability to efficiently process multiple inputs simultaneously, improving performance and usability of the library for various vision-language tasks.

eDeveloperOZ commented 5 months ago

Will take it for implementation! hope to meet the standards :)

Blaizzy commented 4 months ago

Here are some extra details: @willccbb

willccbb commented 3 months ago

Sorry, just saw this -- will take a swing when #53 is merged.

Blaizzy commented 3 months ago

@willccbb done ✅

53 is merged

Benjoyo commented 2 days ago

Hey @willccbb, any update on this? Would be super helpful to have

Blaizzy commented 2 days ago

@willccbb doesn't have the bandwidth.

This feature is now open and back in backlog.