How do you obtain the visualization result in Figure 3?

cvlab-kaist / CAT-Seg

Official Implementation of "CAT-Seg🐱: Cost Aggregation for Open-Vocabulary Semantic Segmentation"

https://ku-cvlab.github.io/CAT-Seg/

MIT License

267 stars 25 forks source link

How do you obtain the visualization result in Figure 3? #6

Closed yxchng closed 1 year ago

yxchng commented 1 year ago

Screenshot from 2023-07-19 21-16-08

CLIP cost volume should be of dimension 24x24. How do you manage to get cost volume of such a high resolution. It does not look like 24x24. Do you mind sharing the code to reproduce this figure?

Thanks.

hsshin98 commented 1 year ago

For the aggregated cost volume, we show the output of our model, hence has a higher resolution of 96x96. We simply apply bilinear upsampling to overlay with the image.

I don't have the code at the moment, but the visualized figures are normalized with min-max with some scaling for visual clarity, as the model output does not necessarily match the scale with the initial cost volume. This would probably be enough to reproduce the figure, but please let me know if you need more details.