Open ahundt opened 6 years ago
You're right about the issue, thanks. However, I've tried a couple ways to represent the ground truth. One of them is as a NxHxWxC tensor, each label being a separate channel, like you describe; so in that case, C is 80 plus an extra channel for background, which makes 81 in total.
It's in here.
Cool thanks! I just wanted to confirm, sorry I missed those lines.
I wonder, should the data be put in sparse tensors?
I would probably need to test to see if it is a faster or slower approach.
Do you represent each label as separate channels in the dataset loader?
I ask because there is a lot of class overlap in COCO and the z order isn't always correct. For example the table category often blocks out all the objects on top of the table if you put it all into a single categorical channel, rather than a one-hot (multiple-hot?) encoding.