Closed NeuZhangQiang closed 3 years ago
Hi @NeuZhangQiang ,
Dear @YudeWang What do you mean by "train a segmentation model on these pseudo labels in fully supervised manner"? Do you mean: use the output of CAM or PAM as the input image, and the manually labeled mask as the target, to train a model (such as Unet)? But, how can we obtain the manually labeled since the SEAM is designed for the weekly supervised segmentation?
In addition, the paper sad: Does it mean: mask = CAM > threshold? Actually, the code in infer_SEAM.py is:
bg_score = [np.ones_like(norm_cam[0])*args.out_cam_pred_alpha]
pred = np.argmax(np.concatenate((bg_score, norm_cam)), 0)
It also mean: mask = CAM > threshold? Thus, it really make me puzzled by: "train a segmentation model on these pseudo labels in fully supervised manner".
In addition, in the figure in the paper:
the Cls Loss is calculated using the feature from CAM. However, the code in train_SEAM.py is:
cam1, cam_rv1 = model(img1)
label1 = F.adaptive_avg_pool2d(cam1, (1,1))
...
loss_cls1 = F.multilabel_soft_margin_loss(label1[:,1:,:,:], label[:,1:,:,:])
In this code, the classification loss is calculated by using the feature from PCM (cam_rv1). It also make me a little confuzed.
@NeuZhangQiang SEAM+AffinityNet generates pixel-level pseudo labels for each image in train set. These pseudo labels can be used as target to train a segmentation model (such as deeplab) instead of manually labeled segmentation annotations, which are not available in WSSS problem.
As for cls loss, the code you given has shown that cls loss is calculated by cam1
instead of cam_rv1
......
The SEAM is really a excellent work. After reading the paper, I have a question:
how to get the final segmentation mask? In my understanding, the SEAM finally output a CAM map, then the Random work is used to segment the final mask? Am I right?
How to calculate the classification loss? For example, the final output is and we can also calculate the background as: but, how can we use the two result to calculate the loss? how can we generate the ground truth? Is img(m, n) = c (the true label) the ground truth?
Any suggestion is appreciated!