Open zhang-zx opened 5 years ago
Hi, sorry to bother you again. I really look forward to reading your new paper speeding up the LK part in this work.
Sorry for the late reply, I'm facing some ddls. The manuscript should be public in one month, since it needs some time for Facebook internal review.
Hi, I wonder is this model suitable for mobile device running Android, I am trying to do this.
@xfj81525 hi, thanks for your interest. This algorithm is used for training models with additional temporal supervision. This training procedure might be difficult to be adapted to mobile devices, while the learned model should be able to run on Android.
hi,thanks for you kindly reply, I am very interesting on what you have done. finally I want to detect the face landmarks in my ssd-mobilenet based face detector, so, I made some tiny modification of your eval.py and pass the face detection points into it . The running time of landmark inference is almost 300 ms per time( batch_heatmaps, batch_locs, batch_scos = net(inputs)), is there any possible to improve the performance? thanks
To improve efficiency, you can change the backbone of our SBR, which is VGG-16 (see https://github.com/facebookresearch/supervision-by-registration/blob/master/lib/models/cpm_vgg16.py). Besides, we use an input size of 224, you can downsample the input to 128, or 96, or 64.
Ok, I will have a try, thanks
Hi, I noticed in your paper, it is said that only the first 4 layers of VGG-16 is used to extract feature, but actually in code(self.features = nn.Sequential), 13 layers is used to extract feature. Performance analysis indicates feature extracting cost nearly 74% time. I wonder whether it is ok to just use the first 4 layers to extract feature meanwhile keeping the final landmark detection accuracy and stability because performance is very crucial for me. I want to speed up it as quick as possible thanks and look forward for your reply
Hi, @xfj81525 , sorry for the confusion. It should be the first four convolutional stages, which have 10 convolutional layers. It is not ok to just use 4 conv layers. To speed up, you can try to replace the VGG backbone with MobileNet backbone.
Ok, @D-X-Y , thanks for your kind reply, I will try to replace the feature extract
Hi, D-X-Y, sorry to bother you again. Now I have replaced the backbone VGG-16 with MobileNet, of which I used the first 13 layers as flowing code : nn.Conv2d(3, 32, kernel_size=3, stride = 2, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(32, 32, kernel_size=3, stride = 1, dilation=1, groups=32, padding=1), nn.ReLU(inplace=True), nn.Conv2d(32, 64, kernel_size=1, stride = 1, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(64, 64, kernel_size=3, stride=2, dilation=1, groups=64, padding=1), nn.ReLU(inplace=True), nn.Conv2d(64, 128, kernel_size=1, stride=1, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(128, 128, kernel_size=3, stride=1, dilation=1, groups=128, padding=1), nn.ReLU(inplace=True), nn.Conv2d(128, 128, kernel_size=1, stride=1, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(128, 128, kernel_size=3, stride=2, dilation=1, groups=128, padding=1), nn.ReLU(inplace=True), nn.Conv2d(128, 256, kernel_size=1, stride=1, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, stride=1, dilation=1, groups=256, padding=1), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=1, stride=1, dilation=1, padding=1), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, stride=2, dilation=1, groups=256, padding=1), nn.ReLU(inplace=True), nn.Conv2d(256, 512, kernel_size=1, stride=1, dilation=1, padding=1), nn.ReLU(inplace=True),
But unfortunately, error happens in this function: stage_loss = criterion(output, target), error info is:RuntimeError: The size of tensor a (215200) must match the size of tensor b (421792) at non-singleton dimension the input image is 224x224x3 I will dig deep into the error further , it will be deeply appreciated if you could probably give me some clues on this error. thanks
Note that we only has 3 downsample layers in https://github.com/facebookresearch/supervision-by-registration/blob/master/lib/models/cpm_vgg16.py#L27, while your codes have 4 downsample layers.
Hi, D-X-Y, thanks for your help, finally the problem is fixed,which is caused by wrong padding. Now I used the first 11 layers of MobileNet to extract feature. But the final channel is 256, which is half of your original network, maybe this will decrease the accuracy. Do you think so ?
Yes, it might cause decreased accuracy. In addition, the MobileNet is originally trained in a different strategy than VGG, and thus you may try some different training hyper-parameters.
Hi, D-X-Y, after replace the feature extractor , it is dramatically speeded up, but the accuracy is also decreased. I noticed that the loss is hard to converge, it is almost 127 after training ends. is there any advice on improving the accuracy thanks in advance
Hi, D-X-Y, after replace the feature extractor , it is dramatically speeded up, but the accuracy is also decreased. I noticed that the loss is hard to converge, it is almost 127 after training ends. is there any advice on improving the accuracy thanks in advance
Hi,xfj81525.Are you still working at speeding up this project?
Thanks for your wonderful work. In the former issue, you have mentioned that your work about speeding up the LK was about to publish. Is it published now? Can't wait to read your next wonderful paper!!