feinanshan / TDNet

Temporally Distributed Networks for Fast Video Semantic Segmentation
http://cs-people.bu.edu/pinghu/TDNet
MIT License
201 stars 45 forks source link

Question about the PID #3

Closed theevann closed 4 years ago

theevann commented 4 years ago

Hello ! Thank you for providing code for this nice paper !

In the file ptsemseg/models/td4_psp/td4_psp.py, I see that the PIDs at lines 73 and 74 are 0 and 1:

self.psp1 =  PyramidPooling(512*self.expansion, norm_layer, self._up_kwargs, path_num=self.path_num//2, pid=0)
self.psp2 =  PyramidPooling(512*self.expansion, norm_layer, self._up_kwargs, path_num=self.path_num//2, pid=1)
self.psp3 =  PyramidPooling(512*self.expansion, norm_layer, self._up_kwargs, path_num=self.path_num//2, pid=0)
self.psp4 =  PyramidPooling(512*self.expansion, norm_layer, self._up_kwargs, path_num=self.path_num//2, pid=1)

I would expect path_num to be kept to 4 and pid to be from 0 to 3 according to the paper. Here 2 TDNets will process the same channel portion. Why is that so ?

feinanshan commented 4 years ago

Hi, Thanks for your comments!
Yes, it can be like that. The output dimension for subnetworks is just a technique choice and can be defined differently. In the case of TD4-PSP18, since the output of PSP18 is 1024D and split it into four parts (256D for each) decreases the representation ability. Thus we instead apply 512D for each.