[FEATURE] Image Bounding Box Annotations for vision & multimodal workflows

JonnyTran commented 7 months ago

Hi team! Given the awesome SpanQuestion new feature recently released, I'm tempted to ask if it's possible to have the same done for annotating regions of interests for images. It would be marvelous if there's a way to draw and label rectangle bounding boxes on an ImageField, similar to LabelStudio's bounding boxes. I love Argilla and hope this feature can be on the roadmap!

Is your feature request related to a problem? Please describe. As LLMs are increasingly becoming multimodal and many data workflows involve a mixture of text, document, and image data types, a very common task is to highlight specific regions in an image for annotation or other downstream processing. For a specific use case, suppose an LLM is tasked with detecting objects and return a bounding box of the detected objects in JSON. It would be great to be able to take this JSON output and have humans add, edit or label the bboxes, which can either be used for finetuning multimodal LLMs or downstream tasks.

Describe the solution you'd like Currently images are displayed using image_to_html to encode images as html in a TextField. We can create a ImageField to contain an image url and other metadata (width, height, scale, offsets), and a ImageSpanQuestion where users can create RectangleLabels and PointLabels stored as responses or suggestions.

Describe alternatives you've considered There isn't really a workaround to annotating region of interests without drawing on top of the image.

### Tasks

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 90 days with no activity.

nataliaElv commented 2 months ago

Hey @JonnyTran ! Are you still interested in this? We're currently designing this feature and would love to have a chat with you about it.

JonnyTran commented 2 months ago

Hi @nataliaElv, yes I'm still interested in this feature. Would a call work for you? I'm generally available anytime between 0900-1800 PDT.

nataliaElv commented 2 months ago

Hi @JonnyTran! I have this scheduling link: https://app.reclaim.ai/m/natalia-elvira-huggingface/office-hours If none of the times work for you send me an email at natalia.elvira@huggingface.co I'm also reachable in the Discord community 😃

nataliaElv commented 2 months ago

@JonnyTran I just realized that none of the slots are in a reasonable time given your timezone. Please, send me an email in the address above and we can agree on a slot that's later in the evening for me. Let me know if Monday or Wednesday next week might work for you. Thanks!

sean-hickey-wf commented 1 month ago

@nataliaElv can I ask is this still going ahead?

nataliaElv commented 1 month ago

@sean-hickey-wf The development team isn't working on this yet, but it is definitely in our backlog.

argilla-io / argilla

[FEATURE] Image Bounding Box Annotations for vision & multimodal workflows #4739