albumentations-team / albumentations

Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
https://albumentations.ai
MIT License
14.04k stars 1.63k forks source link

Can I inform the initial bounding box visibilities? #1767

Open gui-miotto opened 3 months ago

gui-miotto commented 3 months ago

My Question

Is it possible to inform a transformation what is the initial visibility of each bounding box?

As far as I know, the transformation always assume that the objects are 100% visible in the before the transformation. But in real life, that is not always the case

Additional Context

I work with object detection on very high resolution images. As a preprocessing step, the images of the training dataset have to be sliced before they can be used by the model. During this preprocessing, the visibility of many bounding boxes become less than 100%. Of course, I can calculate those values, but is there a way to use them with albumentations?

gui-miotto commented 3 months ago

If this is not possible, a workaround could be achieved if albumentation returns me the "perceived" visibility after the transformation. In that way I could calculate the "real" visibility as the multiplication of the initial and the perceived visibilities.

ternaus commented 3 months ago

I do not understand the question yet.

As I understand, you crop parts from the image and bounding boxes that are not 100% contained in the image get truncated, right? And this becomes an issue.

Or not?

Could you provide some code?

gui-miotto commented 3 months ago

Hi @ternaus , thanks for the reply.

Yes, you are correct. They get truncated. Therefore their visibility is not 100% to start with.

Unfortunately I don't think providing code will make things any clearer, because this is more of an workflow problem. So let me give an hypothetical situation:

1 - Imagine my dataset have images 2000x2000 px. 2 - My model just works with images of 500x500. 3 - Since I need the full resolution to identify the objects, I should not shrink the images. What I do instead is to slice the full res image (2000x2000) into 16 non-overlapping 500x500 patches. 4 - Now, imagine that there is a 2000x2000 image with two objects. During the slicing process, one object gets cut in half. The other stays fully visible in a single patch.

Everything up to this point happens before training the model. Its dataset pre-processing and has nothing to do with Albumentations.

5 - Now I'll start training a model and use albumentations. Then comes the question: _Given that I want to work with minimal visibility of 40%, which value of min_visibility should I give to albumentations?_