YudeWang / SEAM

Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation, CVPR 2020 (Oral)
MIT License
539 stars 97 forks source link

Question about the classification loss #15

Closed NeuZhangQiang closed 3 years ago

NeuZhangQiang commented 3 years ago

The SEAM is really a excellent work. After reading the paper, I have a question:

  1. how to get the final segmentation mask? In my understanding, the SEAM finally output a CAM map, then the Random work is used to segment the final mask? Am I right?

  2. How to calculate the classification loss? For example, the final output is image and we can also calculate the background as: image but, how can we use the two result to calculate the loss? how can we generate the ground truth? Is img(m, n) = c (the true label) the ground truth?

Any suggestion is appreciated!

YudeWang commented 3 years ago

Hi @NeuZhangQiang ,

  1. SEAM+AffinityNet generate pixel-level pseudo labels and there is another retrain step to train a segmentation model on these pseudo labels in fully supervised manner.
  2. The calssification loss is calculated on foreground category. https://github.com/YudeWang/SEAM/blob/2a06992d6515424c62f8a6cc0ca0e2e42aab5822/train_SEAM.py#L125 The background activation is calculated here. https://github.com/YudeWang/SEAM/blob/2a06992d6515424c62f8a6cc0ca0e2e42aab5822/train_SEAM.py#L131
NeuZhangQiang commented 3 years ago

Dear @YudeWang What do you mean by "train a segmentation model on these pseudo labels in fully supervised manner"? Do you mean: use the output of CAM or PAM as the input image, and the manually labeled mask as the target, to train a model (such as Unet)? But, how can we obtain the manually labeled since the SEAM is designed for the weekly supervised segmentation?

In addition, the paper sad: image Does it mean: mask = CAM > threshold? Actually, the code in infer_SEAM.py is:

bg_score = [np.ones_like(norm_cam[0])*args.out_cam_pred_alpha]
pred = np.argmax(np.concatenate((bg_score, norm_cam)), 0)

It also mean: mask = CAM > threshold? Thus, it really make me puzzled by: "train a segmentation model on these pseudo labels in fully supervised manner".

In addition, in the figure in the paper: image

the Cls Loss is calculated using the feature from CAM. However, the code in train_SEAM.py is:

cam1, cam_rv1 = model(img1)
label1 = F.adaptive_avg_pool2d(cam1, (1,1))
...
loss_cls1 = F.multilabel_soft_margin_loss(label1[:,1:,:,:], label[:,1:,:,:])

In this code, the classification loss is calculated by using the feature from PCM (cam_rv1). It also make me a little confuzed.

YudeWang commented 3 years ago

@NeuZhangQiang SEAM+AffinityNet generates pixel-level pseudo labels for each image in train set. These pseudo labels can be used as target to train a segmentation model (such as deeplab) instead of manually labeled segmentation annotations, which are not available in WSSS problem.

As for cls loss, the code you given has shown that cls loss is calculated by cam1 instead of cam_rv1......