NVlabs / ODISE

Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
https://arxiv.org/abs/2303.04803
Other
845 stars 45 forks source link

a weird bug #17

Open lihenglin opened 1 year ago

lihenglin commented 1 year ago

Thanks for the nice work! So I was playing with some images using the hugging face demo, and I found out that the model is able to detect the coffee maker in the scene if I use the LVIS categories. However, if I just use a single category "coffee maker,coffee machine", the model is not able to detect the coffee maker in the image. Do you know what might be the problem here? BTW, I can provide the image if you want.

GiscardBiamby commented 1 year ago

I'm having a similar issue.

In some cases it kind of works if I include a trailing comma, e.g., "coffee maker, coffee machine,". But this is not a fix because it can add false positives to the result (i.e., some regions would be segmented with label 'coffee maker' though they have nothing to do with coffee machines).

The false positive problem is reduced if I include COCO or ADE classes (as opposed to just using a custom vocab and using empty label_list) because those seem to take precedence.