Several Questions on TubeFormer-CVPR-2022.

lxtGH commented 2 years ago

We have several detailed questions since we cannot find the code.

1.For VSS task, Is the global memory the prediction kernel of last convolution? Did you use bipartite matching?

2.For VPS task on KITTI-STEP, in the section of “Global memory with split thing and stuff.” Did you use bipartite matching for stuff memory or directly use Cross Entropy Loss?

3.For both VPS and DVPS tasks, we are also confusing on the prediction label range in the section of “Global memory with split thing and stuff.” Dose the mask classification for thing and stuff is performed jointly or individually? (joint classification head for C{thing}+C{stuff} or two heads for C{thing} and C{stuff} to hand each.)

4.For DVPS task, how did you handle the un-labeled region on KITTI-DVPS since the labels are very sparse?

5,Will the code be released for reference? Thanks a lot!!!!!

lxtGH commented 2 years ago

Hi! We are big fans of your work. Could you help us to better understand your work? @mcahny @aquariusjay Thanks a lot !!!!!

mcahny commented 2 years ago

Hi, thanks for asking.

Yes, the global memory (after the last 2 FC layers) is the prediction kernel. We use fixed assignment (instead of bipartite matching).
We do not use bipartite matching for stuff classes. The stuff classes are given Cross Entropy and VPQ style losses.
As we use the fixed assignment between the stuff memory and stuff classes, the mask classification for thing and stuff is performed individually.
We just labeled the unlabeled regions as ‘unlabeled’ and the loss ignores those regions.
We are not sure about when yet.

Thanks.

lxtGH commented 2 years ago

Thanks for your reply!! Dr.Dahun @mcahny As I prepare to re-implement your TubeFormer using Pytorch(mmdet). I want to know the details of mask based tracking part. Did you use ViP-like mask based tracking in off-line manner? Looking for you reply!!!

google-research / deeplab2

Several Questions on TubeFormer-CVPR-2022. #136