Open KeyKy opened 7 years ago
I am also receiving an error rn trying to convert a ResNet 50:
KeyError: 'No translator registered for layer: name: "bn_1"\ntype: "BatchNorm"\nbottom: "conv_1"\ntop: "conv_1"\nparam {\n lr_mult: 0.0\n decay_mult: 0.0\n}\nparam {\n lr_mult: 0.0\n decay_mult: 0.0\n}\nparam {\n lr_mult: 0.0\n decay_mult: 0.0\n}\nbatch_norm_param {\n use_global_stats: true\n}\n yet.'
I found a PR that seems to address this issue: https://github.com/caffe2/caffe2/pull/430
Finishes successfully. Hope it works!
@littleowl have you successfully run it? I get the following error:
Input index 0 and output idx 0 (conv1) are set to be in-place but this is actually not supported by op SpatialBN. [enforce fail at operator.cc:69] schema->Verify(operator_def). Operator def did not pass schema checking: input: "conv1" input: "conv1_scale" input: "conv1_bias" input: "conv1_mean" input: "conv1_var" output: "conv1" type: "SpatialBN" arg { name: "is_test" i: 1 } arg { name: "epsilon" f: 1e-05 } arg { name: "order" s: "NCHW" } device_option { device_type: 1 cuda_gpu_id: 3 }
Fix it with my prototxt because SpatialBN in caffe2 is not in-place layer. Howerver, when i running it, i get this warning:
W0502 17:55:53.081574 13078 conv_pool_op_base.h:554] You are hitting a case where Caffe's legacy padding calculation is hit. This leads to inefficient and sometimes incorrect results. We are keeping this behavior for backward compatibility, but you are strongly recommended to move away from it.
@KeyKy I do actually get the same [enforce fail at operator.cc:69]
error as you. I'll check out your prototxt
- Thx!
Analyzing your prototxt file - it seems as though for every BatchNorm layer you set the name
to also be the top
. I have done the same and everything seems OK so far. Not able to try running it just yet as I'm trying to get this working on iOS.
@littleowl very helpful. Also tried that PR, it works on my translation.(Confirm)
@KeyKy, thanks for the exploring. However, there are too many batchnorm layers in resnet, did you change the prototxt by hand or script? If second, could you share it? Thks!
@Primus-zhao i change the prototxt by hand with netscope.
@KeyKy I used your method to convert caffe to caffe2 for ssd (ref: https://github.com/KeyKy/caffe2/blob/master/caffe2/python/examples/ssd/) However, during detection, I am facing the same warnings:
You are hitting a case where Caffe's legacy padding calculation is hit. This leads to inefficient and sometimes incorrect results. We are keeping this behavior for backward compatibility, but you are strongly recommended to move away from it.
This gives me false detection bounding boxes. Could you help me with that if you were able to solve the issue?
Thanks!
@rams16592 could you give me your image which has false detection bounding boxes. Do you try the image in original caffe ssd and compare the results? my email is 370846270@qq.com.
@rams16592 I have received your email. I found detection_out_op is slow because i implement it in cpu and ssd caffe has a gpu implementation. I will try it a few days later, hope it will get an improvement.
@KleinYuan base on the script https://github.com/caffe2/caffe2/pull/430/files, i can translate my model with no error, but when i test my new caffe2 model , i find the feature output is wrong, the number is NaN or zero.Have you test the model you transfered?is it correct?
@nyyznyyz1991 yes, I have the same issue and look into that.
Hi @rams16592 , After some hard work, I implemented the gpu detection_out_op and the benchmark. See the latest commit. It should be faster than before!!
@KeyKy Thank you! I just saw that and tried. The cost for detection output has decreased for Jetson Tx1 too. However when I was benchmarking, I saw that the difference between the convOp for Caffe and Caffe2 is not much. Is that the same case with you still? I saw you too faced the same problem. (ref: https://github.com/caffe2/caffe2/issues/534)
@rams16592 Yes. This kind of difference also exists between [mxnet and caffe] (https://github.com/msracver/Deformable-ConvNets). Analogically between caffe2 and caffe. Now, what’s your detection speed in Jetson Tx1?
@KeyKy I see now. Thanks for the update. I understood the reason.
@nyyznyyz1991 @KleinYuan - I too have this issue with the NaN
after patching the caffe_translator.py
file and adapting the net structure with Netscope.
One thing I noticed was that some of the layers of data contain really really small numbers like 0.0122e-6
or something like that. I have no idea if that mattered, but it got me to thinking that maybe there is something wrong with protobuf.
Looking at my setup, I had :
I was using 3.2 to do the translation and 3.1 for the implementation and the original files were maid with 2.* probably.
Not sure how to properly update protobuf binaries from 2. to 3. or even if it's a big deal or not. Anyone know if they are compatible?
So I wondered if maybe there was some incompatibilities going on.
I then tried to do the translation using 2.6.1 from docker and tried that in iOS.
Surprisingly, I no longer get NaN
at all. That's good news, but not so much since instead I just get the incorrect values of 1.0
and 0.0
no matter what in my final layer (which only has a length of 2).
Obviously assuming there are going to be problems going from 2. to 3. Which means that I can try 2 things.
Totally not sure if I'm going down the right rabbit hole or not, but thought I would share my insights.
I want to translate ResNet-152 into caffe2. However, I get this error:
Following is how i run caffe_translator.py: