You used prompt engineering. I read the code and found that you use the template to generate ov_classifier_weight in san.py lines 161-164 during training. Have you used prompt engineering anywhere else?
For inference, where did you use prompt engineering and class information? Is the class information used as known information for a single picture when inferring?
Can this work be considered as zero-shot semantic segmentation and Why?
These problems confused me a lot! I'm looking forward to your reply.
Hi, thank you for your great effort! I feel confused about the loss computation.
These problems confused me a lot! I'm looking forward to your reply.