facebookresearch / ov-seg

This is the official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP.
Other
676 stars 61 forks source link

Per-pixel feature extraction #31

Open soumajm opened 5 months ago

soumajm commented 5 months ago

Hello, thanks for making the code available. I have a question. Is it possible to obtain per-pixel features (e.g., 512-D or 768-D) instead of N_mask x W x H and N_mask x feat_dim that the encoder provides as output?

On a similar note, Is it a correct understanding that the mask proposal generation and then subsequent classification architectures does not have an intermediate per-pixel feature representation?