Open dino65-dev opened 1 day ago
Thank you for creating this issue! š We'll look into it as soon as possible. In the meantime, please make sure to provide all the necessary details and context. If you have any questions or additional information, feel free to add them here. Your contributions are highly appreciated! š
You can also check our CONTRIBUTING.md for guidelines on contributing to this project.
Title
Enhanced Image Understanding with CLIP
Enhancement Aim
Integrate CLIP into the Computer Vision project to enable sophisticated image-text understanding, enhancing visual recognition and classification tasks through advanced language-vision models.
Changes
Features
Asynchronous Image Processing: Utilizes Python's
asyncio
to load and classify images concurrently, improving performance when handling large batches.Dynamic Prompt Generation: Automatically generates classification prompts based on user-defined base terms, allowing for flexible and contextual image queries.
Confidence Thresholding: Filters classification results based on a user-defined confidence threshold, enhancing accuracy by omitting less certain predictions.
Multi-Modal Retrieval: Enables users to retrieve images based on textual descriptions and vice versa, offering a versatile tool for various multi-modal tasks.
Robust Error Handling: Includes comprehensive error handling and logging to help diagnose issues related to image loading and processing.
Batch Processing: Supports processing multiple images from a specified folder, making it suitable for large datasets.
GPU Acceleration: Automatically utilizes GPU for faster model inference if available, significantly improving processing times.
Screenshots š·
No response
Guidelines
Full Name
Dinmay Kumar Brahma
Participant Role