How to test the PWC-Net module

littlespray commented 4 years ago

Hi, thank you for your great work.

I have used the VCN module and it works well. But when I wanted to train the PWC-Net module to make a comparison, I found that just changing this(line 201, in main.py)

model = VCN([batch_size//ngpus]+data_inuse.datasets[0].shape[::-1], md=[int(4*(args.maxdisp/256)), 4,4,4,4], fac=args.fac)

to

model = PWCDCNet([batch_size//ngpus]+data_inuse.datasets[0].shape[::-1])

cannot work, it shows:

Iter 2 training loss = -7931635.000 , AEPE = 234449312.000 , time = 0.16
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=77 : an illegal memory access was encountered

I think it is because in models/PWCNet.py, line 300, the output is

return flow2*20,flow3*20,flow4*20,flow5*20,flow6*20,flow2, flow2[:,0]

It seems that he output[-2] and output[-1] are not the required loss and oor, so should I rewrite these part as that in VCN in order to train the PWCNet? namely, adding

            oor2 = F.upsample(oor2[:,np.newaxis], [im.size()[2],im.size()[3]], mode='bilinear')[:,0]
            oor3 = F.upsample(oor3[:,np.newaxis], [im.size()[2],im.size()[3]], mode='bilinear')[:,0]
            oor4 = F.upsample(oor4[:,np.newaxis], [im.size()[2],im.size()[3]], mode='bilinear')[:,0]
            oor5 = F.upsample(oor5[:,np.newaxis], [im.size()[2],im.size()[3]], mode='bilinear')[:,0]
            oor6 = F.upsample(oor6[:,np.newaxis], [im.size()[2],im.size()[3]], mode='bilinear')[:,0]
            loss += self.get_oor_loss(flowl0[:,:2]-0,        oor6, (64* self.flow_reg64.flowx.max()),occ_mask)
            loss += self.get_oor_loss(flowl0[:,:2]-up_flow6, oor5, (32* self.flow_reg32.flowx.max()),occ_mask)
            loss += self.get_oor_loss(flowl0[:,:2]-up_flow5, oor4, (16* self.flow_reg16.flowx.max()),occ_mask)
            loss += self.get_oor_loss(flowl0[:,:2]-up_flow4, oor3, (8* self.flow_reg8.flowx.max())  ,occ_mask)
            loss += self.get_oor_loss(flowl0[:,:2]-up_flow3, oor2, (4* self.flow_reg4.flowx.max())  ,occ_mask)

            return flow2*20, flow3*20,flow4*20,flow5*20,flow6*20,loss, oor2

Thank you very much for any help!

gengshan-y commented 4 years ago

There are two parts of losses in VCN: 1) loss of flow estimation, and 2) loss of oor (out-of-range detection) module. To train PWCNet, only 1) is needed.

            flowl0 = disc_aux[0].permute(0,3,1,2).clone()
            mask = disc_aux[1].clone()
            loss =  1.0*torch.norm((flow2*20-flowl0[:,:2]),2,1)[mask].mean() +\
                    0.5*torch.norm((flow3*20-flowl0[:,:2]),2,1)[mask].mean() + \
                  0.25*torch.norm((flow4*20-flowl0[:,:2]),2,1)[mask].mean() + \
                  0.25*torch.norm((flow5*20-flowl0[:,:2]),2,1)[mask].mean() + \
                  0.25*torch.norm((flow6*20-flowl0[:,:2]),2,1)[mask].mean()

I think you also need to change some interface of PWCNet, for example, pass disc_aux (some training auxiliary variable) to the forward function.

This piece of PWCNet code is not meant to be used for training, although I believe some simple modification would make it work.

gengshan-y commented 4 years ago

Another thing to notice is that PWCNet is not easy to train from my experience, you may want to take a look of the supplement of VCN to change a few hyper-parameters.

littlespray commented 4 years ago

Got it! Thank you very much for your help!

gengshan-y / VCN

How to test the PWC-Net module #13