Question about asserting the upper bound of inst_id

beyse commented 2 years ago

Hello there 👋

As described here in #38 one should specify the instance id as follows: object["inst_id"] = categories_id * 1000 + index. The following example demonstrates that uint16 should be used to store the instance segmentation information in an image:

https://github.com/DIYer22/bpycv/blob/074f49b6494c9784a12067b3174e93f0f52dddc8/example/demo.py#L44

I am assuming this is done to comply with Cityscape dataset, which is perfectly fine.

Since 2^16 = 65536 and the instance id is categories_id * 1000 + index we can not have more than 65 categories (classes) and can not count more than roughly 500 instances per class with this approach if my math is right.

But I would like to go way beyond this and I have noticed that 32 bit integer are used internally and I could use

cv2.imwrite('/out/put/image.tiff', np.float32(result["inst"]))

to save a 32 bit floating point image, which obviously gives me much more possibilities when it comes to the number of categories and instances that can be annotated.

However I noticed following assertion in the code:

https://github.com/DIYer22/bpycv/blob/074f49b6494c9784a12067b3174e93f0f52dddc8/bpycv/pose_utils.py#L147

which again limits the instance id, but I could not find an explanation for what is finally my question:

Why must the inst_id be <= 100e4 here ?

Many thanks in advance 😺

DIYer22 commented 2 years ago

According this, bpycv will encode inst id represent as [0.~1.] float32 RGB value, that blender can read as color to render output, code here:

https://github.com/DIYer22/bpycv/blob/074f49b6494c9784a12067b3174e93f0f52dddc8/bpycv/utils.py#L13

We choice a encode solution that provide inst_id more thant 1e6 and could distinguish instance on the float32 rgb

beyse commented 2 years ago

Hi,

thanks for the reply.

I have carefully read this but I could not find any evidence that

bpycv will encode inst id represent as [0.~1.] float32 RGB value, that blender can read as color to render output.

Maybe I am missing something? Perhaps you can point me the exact paragraph or sentence that you mean. In any case, it is not disputed that blender uses floating point numbers to represent RGB values.

I have also read

https://github.com/DIYer22/bpycv/blob/074f49b6494c9784a12067b3174e93f0f52dddc8/bpycv/utils.py#L13

which clearly shows the inst_id is mapped to floating point RGB values and that is perfectly in line with what I wrote in my initial comment:

I could use
cv2.imwrite('/out/put/image.tiff', np.float32(result["inst"]))
to save a 32 bit floating point image, which obviously gives me much more possibilities when it comes to the number of categories and instances that can be annotated.

When you write

We choice a encode solution that provide inst_id more thant 1e6 and could distinguish instance on the float32 rgb

I think that perfectly fits my observation that the inst_id can be bigger than 1e6. However I noticed following assertion in the code:

https://github.com/DIYer22/bpycv/blob/074f49b6494c9784a12067b3174e93f0f52dddc8/bpycv/pose_utils.py#L147

This assertion checks whether the inst_id is smaller (or equal to) than 100e4 (which is 1e6) and creates a runtime error if it is bigger than 1e6.

I was not able find an explanation in your recent comment, so I would like to repeat my question:

Why must the inst_id be <= 100e4 here ?

Thank you for the picture with the two green cubes, here is one with a blue sphere 😄

DIYer22 commented 2 years ago

Why must the inst_id be <= 100e4 here ?

If inst_id is too big, the encode/decode solution of bpycv can not accurately recover inst id from RGB value

For example:

instid =1100000; 
recover = encode_inst_id.rgb_to_id(encode_inst_id.id_to_rgb(instid)) 
# recover will be 274999

beyse commented 2 years ago

Alright, that is a reason.

I was interested in finding the first number which does not work and wanted to find out why, so I wrote this test code:

def convert_and_check(original_id):

    try:
        logger.info(f'input = {original_id}')

        rgb = encode_inst_id.id_to_rgb(original_id)
        logger.debug(f'rgb = {rgb}')
        converted_id = encode_inst_id.rgb_to_id(rgb)

        if original_id != converted_id:
            logger.error(f'Failure: {original_id} != {converted_id}')
        else:
            logger.success(f'{original_id} works fine.')
    except BaseException as ex:
        logger.error(f'Runtime Error when using {original_id}: {ex}')

and I run it like:

    for i in range(1048574, 1048576):
        convert_and_check(i)

and I get:

2022-06-17 23:54:42.300 | DEBUG    | __main__:convert_and_check:79 - rgb = [0.         0.99999905 0.        ]
2022-06-17 23:54:42.301 | SUCCESS  | __main__:convert_and_check:85 - 1048574 works fine.
2022-06-17 23:54:42.307 | DEBUG    | __main__:convert_and_check:79 - rgb = [0.00000000e+00 4.76837158e-07 0.00000000e+00]
RuntimeWarning: divide by zero encountered in long_scalars
2022-06-17 23:54:42.315 | DEBUG    | __main__:convert_and_check:79 - rgb = [0.00000000e+00 1.43051147e-06 0.00000000e+00]
2022-06-17 23:54:42.316 | ERROR    | __main__:convert_and_check:83 - Failure: 1048576 != 262143

Summing it up in a more readable way:

Number	Outcome
1048574	works fine ✅
1048575	divide by zero error 💥
1048576	can't recover ❌

So the first number which does not work just so happens to be 2^20 - 1. The divide by zero problem happens in this line and comes from the fact that max_denominator is set to 2^20.

So I was wondering what happens if I set max_depth to 21 insead of 20. Sure enough, this time the numbers I test work fine:

2022-06-18 00:10:36.940 | SUCCESS  | __main__:convert_and_check:86 - 1048573 works fine.
2022-06-18 00:10:36.946 | DEBUG    | __main__:convert_and_check:80 - rgb = [0.         0.99999905 0.        ]
2022-06-18 00:10:36.948 | SUCCESS  | __main__:convert_and_check:86 - 1048574 works fine.
2022-06-18 00:10:36.954 | DEBUG    | __main__:convert_and_check:80 - rgb = [0.00000000e+00 4.76837158e-07 0.00000000e+00]
2022-06-18 00:10:36.957 | SUCCESS  | __main__:convert_and_check:86 - 1048575 works fine.
2022-06-18 00:10:36.962 | DEBUG    | __main__:convert_and_check:80 - rgb = [0.00000000e+00 1.43051147e-06 0.00000000e+00]
2022-06-18 00:10:36.964 | SUCCESS  | __main__:convert_and_check:86 - 1048576 works fine.
2022-06-18 00:10:36.970 | DEBUG    | __main__:convert_and_check:80 - rgb = [0.00000000e+00 2.38418579e-06 0.00000000e+00]
2022-06-18 00:10:36.972 | SUCCESS  | __main__:convert_and_check:86 - 1048577 works fine.
2022-06-18 00:10:36.978 | DEBUG    | __main__:convert_and_check:80 - rgb = [0.00000000e+00 3.33786011e-06 0.00000000e+00]
2022-06-18 00:10:36.979 | SUCCESS  | __main__:convert_and_check:86 - 1048578 works fine.
2022-06-18 00:10:36.985 | DEBUG    | __main__:convert_and_check:80 - rgb = [0.00000000e+00 4.29153442e-06 0.00000000e+00]
2022-06-18 00:10:36.987 | SUCCESS  | __main__:convert_and_check:86 - 1048579 works fine.

So now I am wondering:

What is the reason that max_depth is set to 20?

I get that mapping integer numbers to floating point numbers in range [0, 1] is not trivial. But given that there are 1,056,964,608 distinct single-precision floating point numbers between 0 and 1, I do not see a reason why it here has to stop at 2^20 (1 048 576) numbers. Is it a limitation of blender?

Let me know if I have missed something.

DIYer22 commented 2 years ago

Is it a limitation of blender?

No, Blender is OK

I just choice one encode solution that :

Could mapping integer numbers to three floating point numbers in range [0, 1]
Support 1e6 is big enough. BTW, this encode solution support float too which mean inst_id could be 1.2, 3.14
Human eyes could easy to distinguish instance on the encoded float32 rgb, like figure below:

Those two cubes's inst_id are 2, 3 respectively, and has different encoded RGB color(3 float32).

beyse commented 2 years ago

I see, that seems to make a lot of sense. Thank you for the explanation.

DIYer22 / bpycv

Question about asserting the upper bound of inst_id #41