Questions about the input for frame synthesis network

As described in figure 3 from the CVPR paper, the input of frame synthesis network consists of five components, including raw interpolation kernels, projected flows, warped depth maps, warped frames and warped context features. However, in line 177 to 181 from DIAN_slowmotion.py , the input for rectifyNet seems not as same as described:

rectify_input = torch.cat((cur_output_temp,ref0,ref2, cur_offset_output[0],cur_offset_output[1], cur_filter_output[0],cur_filter_output[1], ctx0,ctx2 ),dim =1)

It seems that the actual input for the frame synthesis network did not include the warped depth maps while used a blended result from warped frames alternatively.

So which one should be the correct way for the proposed method? Would you pleased give a numerical analysis for such different settings?

baowenbo / DAIN

Questions about the input for frame synthesis network #34