About the optimization of LK - Githubissues

facebookresearch / supervision-by-registration

Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors

Other

762 stars 165 forks source link

About the optimization of LK #49

Open zhang-zx opened 5 years ago

zhang-zx commented 5 years ago

Thanks for your wonderful work. In the former issue, you have mentioned that your work about speeding up the LK was about to publish. Is it published now? Can't wait to read your next wonderful paper!!

zhang-zx commented 5 years ago

Hi, sorry to bother you again. I really look forward to reading your new paper speeding up the LK part in this work.

D-X-Y commented 5 years ago

Sorry for the late reply, I'm facing some ddls. The manuscript should be public in one month, since it needs some time for Facebook internal review.

xfj81525 commented 5 years ago

Hi, I wonder is this model suitable for mobile device running Android, I am trying to do this.

D-X-Y commented 5 years ago

@xfj81525 hi, thanks for your interest. This algorithm is used for training models with additional temporal supervision. This training procedure might be difficult to be adapted to mobile devices, while the learned model should be able to run on Android.

xfj81525 commented 5 years ago

hi，thanks for you kindly reply, I am very interesting on what you have done. finally I want to detect the face landmarks in my ssd-mobilenet based face detector, so, I made some tiny modification of your eval.py and pass the face detection points into it . The running time of landmark inference is almost 300 ms per time( batch_heatmaps, batch_locs, batch_scos = net(inputs)), is there any possible to improve the performance? thanks

D-X-Y commented 5 years ago

To improve efficiency, you can change the backbone of our SBR, which is VGG-16 (see https://github.com/facebookresearch/supervision-by-registration/blob/master/lib/models/cpm_vgg16.py). Besides, we use an input size of 224, you can downsample the input to 128, or 96, or 64.

xfj81525 commented 5 years ago

Ok, I will have a try, thanks

xfj81525 commented 5 years ago

Hi, I noticed in your paper, it is said that only the first 4 layers of VGG-16 is used to extract feature, but actually in code(self.features = nn.Sequential), 13 layers is used to extract feature. Performance analysis indicates feature extracting cost nearly 74% time. I wonder whether it is ok to just use the first 4 layers to extract feature meanwhile keeping the final landmark detection accuracy and stability because performance is very crucial for me. I want to speed up it as quick as possible thanks and look forward for your reply

D-X-Y commented 5 years ago

Hi, @xfj81525 , sorry for the confusion. It should be the first four convolutional stages, which have 10 convolutional layers. It is not ok to just use 4 conv layers. To speed up, you can try to replace the VGG backbone with MobileNet backbone.

xfj81525 commented 5 years ago

Ok， @D-X-Y , thanks for your kind reply, I will try to replace the feature extract

xfj81525 commented 5 years ago

Hi, D-X-Y, sorry to bother you again. Now I have replaced the backbone VGG-16 with MobileNet, of which I used the first 13 layers as flowing code : nn.Conv2d(3, 32, kernel_size=3, stride = 2, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(32, 32, kernel_size=3, stride = 1, dilation=1, groups=32, padding=1), nn.ReLU(inplace=True), nn.Conv2d(32, 64, kernel_size=1, stride = 1, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(64, 64, kernel_size=3, stride=2, dilation=1, groups=64, padding=1), nn.ReLU(inplace=True), nn.Conv2d(64, 128, kernel_size=1, stride=1, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(128, 128, kernel_size=3, stride=1, dilation=1, groups=128, padding=1), nn.ReLU(inplace=True), nn.Conv2d(128, 128, kernel_size=1, stride=1, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(128, 128, kernel_size=3, stride=2, dilation=1, groups=128, padding=1), nn.ReLU(inplace=True), nn.Conv2d(128, 256, kernel_size=1, stride=1, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, stride=1, dilation=1, groups=256, padding=1), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=1, stride=1, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, stride=2, dilation=1, groups=256, padding=1), nn.ReLU(inplace=True), nn.Conv2d(256, 512, kernel_size=1, stride=1, dilation=1, padding=1), nn.ReLU(inplace=True),

But unfortunately, error happens in this function: stage_loss = criterion(output, target), error info is:RuntimeError: The size of tensor a (215200) must match the size of tensor b (421792) at non-singleton dimension the input image is 224x224x3 I will dig deep into the error further , it will be deeply appreciated if you could probably give me some clues on this error. thanks

D-X-Y commented 5 years ago

Note that we only has 3 downsample layers in https://github.com/facebookresearch/supervision-by-registration/blob/master/lib/models/cpm_vgg16.py#L27, while your codes have 4 downsample layers.

xfj81525 commented 5 years ago

Hi, D-X-Y, thanks for your help, finally the problem is fixed，which is caused by wrong padding. Now I used the first 11 layers of MobileNet to extract feature. But the final channel is 256, which is half of your original network, maybe this will decrease the accuracy. Do you think so ?

D-X-Y commented 5 years ago

Yes, it might cause decreased accuracy. In addition, the MobileNet is originally trained in a different strategy than VGG, and thus you may try some different training hyper-parameters.

xfj81525 commented 5 years ago

Hi, D-X-Y, after replace the feature extractor , it is dramatically speeded up, but the accuracy is also decreased. I noticed that the loss is hard to converge, it is almost 127 after training ends. is there any advice on improving the accuracy thanks in advance

LinZhiP commented 4 years ago

Hi, D-X-Y, after replace the feature extractor , it is dramatically speeded up, but the accuracy is also decreased. I noticed that the loss is hard to converge, it is almost 127 after training ends. is there any advice on improving the accuracy thanks in advance

Hi,xfj81525.Are you still working at speeding up this project?