Blaizzy commented 5 months ago

Overview

The goal is to add support for efficient batch processing of inputs to the MLX-VLM library. This will allow users to process multiple images and text prompts simultaneously to generate corresponding outputs in a single batch, improving performance.

Use cases:

Generating captions for a large dataset of images.
Localizing objects or regions in a batch of images based on textual descriptions.
Classifying a large number of images into predefined categories, considering accompanying text information.
Answering questions based on a batch of images (single and multiple question prompts).
Video processing.

Note: Tag @Blaizzy for code reviews and questions.

Requirements

Support batched inputs:

Accept a batch of images as input, provided as a list or array of image objects.
Accept a batch of text prompts as input, provided as a list or array of strings.
Accept a single text prompt as input, provided as a string.

Perform batch processing:

Process the batch of images and text prompts simultaneously (async) using the MLX-VLM model.
Utilize parallel processing or GPU acceleration to optimize batch processing performance.
Ensure that the processing of one input in the batch does not affect the processing of other inputs.

Generate batched outputs:

Return the generated outputs for each input in the batch.
Maintain the order of the outputs corresponding to the order of the inputs.
Support different output formats such as text, embeddings, or visual representations based on the specific task.

Error handling:

Handle errors gracefully during batch processing.
Provide informative error messages for invalid inputs or processing failures.
Continue processing the remaining inputs in the batch if an error occurs for a specific input.

API design:

Provide a clear and intuitive API for users to perform batch processing.
Allow users to specify the maximum batch size supported by their system.
Provide options to control the batch processing behavior, such as enabling/disabling parallel processing.

Documentation and examples:

Update the library documentation to include information about the batch processing feature.
Provide code examples demonstrating how to use the batch processing API effectively.
Include performance benchmarks and guidelines for optimal batch sizes based on system resources.

Implementation

Modify the existing input handling logic to accept batches of images and text prompts.
Implement batch processing functionality using parallel processing techniques or GPU acceleration libraries.
Optimize memory usage and performance for efficient batch processing.
Update the output generation logic to handle batched outputs and maintain the correct order.
Implement error handling mechanisms to gracefully handle and report errors during batch processing.
Design and expose a user-friendly API for performing batch processing.
Write unit tests to verify the correctness and performance of the batch processing implementation.
Update the library documentation and provide code examples for using the batch processing feature.

Testing

Prepare a comprehensive test suite to validate the batch processing functionality.
Test with different batch sizes and input variations to ensure robustness.
Verify that the generated outputs match the expected results for each input in the batch.
Measure the performance improvement gained by batch processing compared to individual processing.
Conduct error handling tests to ensure graceful handling of invalid inputs and processing failures.

Delivery

Integrate the batch processing feature into the existing MLX-VLM library codebase.
Ensure backward compatibility with previous versions of the library.
Provide release notes highlighting the new batch processing capability and any breaking changes.
Update the library version number following semantic versioning conventions.
Publish the updated library package to the relevant package repositories or distribution channels.

By implementing this batch processing feature, MLX-VLM will provide users with the ability to efficiently process multiple inputs simultaneously, improving performance and usability of the library for various vision-language tasks.

eDeveloperOZ commented 5 months ago

Will take it for implementation! hope to meet the standards :)

Blaizzy commented 4 months ago

Here are some extra details: @willccbb

willccbb commented 3 months ago

Sorry, just saw this -- will take a swing when #53 is merged.

Blaizzy commented 3 months ago

@willccbb done ✅

53 is merged

Benjoyo commented 2 days ago

Hey @willccbb, any update on this? Would be super helpful to have

Blaizzy commented 2 days ago

@willccbb doesn't have the bandwidth.

This feature is now open and back in backlog.

Blaizzy / mlx-vlm

Batch Processing Feature #40