Welcome to our project where we build a comprehensive Image Analysis API using Microsoft Florence-2-large, Chainlit, and Docker. This API allows you to perform various image analysis tasks such as captioning, object detection, expression segmentation, and OCR.
We utilized the pre-trained Florence-2-large model from Hugging Face Transformers for image analysis. This powerful model has been trained on a vast dataset of images and can perform multiple tasks such as image captioning, object detection, and OCR.
ModelManager
class, responsible for loading and managing the Florence-2-large model.We defined several task prompts to leverage the Florence-2-large model's capabilities. These prompts include:
<CAPTION>
: Generates a simple caption for the image.<DETAILED_CAPTION>
: Provides a detailed description of the image.<OD>
: Detects and locates objects within the image.<OCR>
: Performs Optical Character Recognition.<CAPTION_TO_PHRASE_GROUNDING>
: Locates specific phrases or objects mentioned in the caption within the image.<DENSE_REGION_CAPTION>
: Generates captions for specific regions within the image.<REGION_PROPOSAL>
: Suggests regions of interest within the image without labeling them.<MORE_DETAILED_CAPTION>
: Generates a very comprehensive description of the image.<REFERRING_EXPRESSION_SEGMENTATION>
: Segments the image based on a textual description of a specific object or region.<REGION_TO_SEGMENTATION>
: Generates a segmentation mask for a specified region in the image.<OPEN_VOCABULARY_DETECTION>
: Detects objects in the image based on user-specified categories.<REGION_TO_CATEGORY>
: Classifies a specific region of the image into a category.<REGION_TO_DESCRIPTION>
: Generates a detailed description of a specific region in the image.<OCR_WITH_REGION>
: OCR on specific regions of the image.Clone the repository:
git clone https://github.com/yourusername/MS-FLORENCE2.git
cd MS-FLORENCE2
(Optional) - File now included in the repo - Create a .env file with the following content: env Copy code MODEL_ID=microsoft/Florence-2-large RATE_LIMIT=5
Build the Docker container: docker build -t florence2-image-analysis .
Run the Docker container: docker run --gpus all -p 8010:8010 florence2-image-analysis
Once the container is running, access the Chainlit interface at http://localhost:8010. Follow the instructions in the chat to perform various image analysis tasks.
Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.