Visualization convenience

changhiskhan commented 2 years ago

Goal

we want to deliver an easy experience for developers to visualize images and annotations. One key criterion to enable that is to support users to display annotations like Box2d types on Image types without having to deal with lower level APIs like PIL or OpenCV

Current gap

As demonstrated in https://github.com/eto-ai/rikai/blob/main/python/tests/types/test_vision.py#L261, currently drawing a Label is still inconvenient in 2 ways:

the user is required to specify the exact position, which requires the user to know the associated bounding box (or other annotation)
the user must wrap the label text in a Text constructor call

A few examples:

Currently to draw labeled bounding boxes on images, you have to do the following: Image | bbox1 | bbox2 | Text("label1", (x1, y1)) | Text("label2", (x2, y2)) This is inconvenient because (x1, y1), (x2, y2) requires both:

Getting some anchor position out of bbox1 and bbox2 (xmin, xmax, ymin, ymax)
Some trial and error to figure out what to add/sub from the anchor position so there's no overlap between the label and the bounding box. And this is all just for one image and 2 boxes/labels.

See https://github.com/eto-ai/rikai/blob/main/notebooks/Visualize.ipynb for how the existing mechanism looks like Jupyter

Desired behavior

Image | (bbox1 + "label1") | (bbox2 + "label2") or Image | (bbox1, "label1") | (bbox2, "label2") (or something along those veins) and rikai will automatically put "label1" on the say top-left of bbox1 and label2 on the top-left of bbox2. And to customize the behavior, the user can do something like rikai.options.viz.label_anchor_position = "bottom-right" or rikai.options.viz.label_position_func = {Box2d: lambda bbox: (bbox.xmin-10, bbox.ymin+10)}

Some thoughts on what to do

Allow users to express
+
association.
- For classification we want a label to be associated with the image (i.e., no bounding bound or other annotations).
- For object detection, each image will have zero or more pairs of associated annotations+label pairs.
- For segmentation, each image will have zero or more pairs of associated mask/polygon + label pairs.
Each association has default behavior for automatically configuring (most important is the position) the labels
Allow the user to customize the configurations of the labels

changhiskhan commented 2 years ago

Some references:

Viz interface defined here: https://github.com/eto-ai/rikai/blob/main/python/rikai/viz.py
Each Rikai type has rendering hooks like: https://github.com/eto-ai/rikai/blob/main/python/rikai/types/geometry.py#L257

Renkai commented 2 years ago

I think that's a good idea, I'm interested in it but not sure what's the last sentences Some thoughts on what to do is about, want to talk in the weekly meeting to get more information.

da-liii commented 2 years ago

For rikai-ocr, this feature is really helpful.

Using Spark SQL, I get array<struct<text:string, mask:mask>> as result. It is not correct to return struct<text: text, mask: mask>. Because Text is a viz concept, and ModelType should not bundle the viz logic in it.

But Box2d + "label" is not sufficient, I need Mask + "label" in the following notebook: https://github.com/da-tubi/rikai-ocr/blob/4c136c559870db7469b4b3adec81eacf594957cb/notebooks/KerasOCR.ipynb

eto-ai / rikai