Open ilkarman opened 6 years ago
Yeah, so Caffe2 right now computes all the operators as instructed and one would need to manually remove the unneeded operators manually. The are a few ad hoc scripts we use to do so for common product use cases.
For proper tooling - we are prioritizing this in the upcoming months.
This could also be a possible good path for predictor auto-optimization - cc @salexspb .
Thanks for the detailed note!
Yeah, we probably can just omit calculation of things that are not needed for an output calculation.
model_helper.ExtractPredictorNet() would allow you to extract only the relevant portion of the net. It allows you to define inputs and outputs, so in this case the outputs would be 'pool5'.
Thanks for the comments, I will try model_helper (and also to add an arg_scope to the model). @Yangqing I'm not sure if this a different topic but I haven't been able to get good inference speeds using caffe2 on CPU or GPU. For example, with MXNET I get 150 images/s on GPU and 12/s on CPU (forward-passes to avg_pooling layer in ResNet50. However, with Caffe2 I get 69/s and 5/s respectively.
I don't think this is because of computing an additional softmax layer. My ideas were:
The training time of resnet50 on Caffe2 for me is the fastest amongst all other frameworks, so I'm not sure what I'm doing wrong with inference.
@ilkarman hi, i have the same problem. in my case open mp do not work properly. did you check openmp working? I think openmp does not working
What's interesting for me is when I run inference on Caffe2 using onnx_caffe2.backend, then Caffe2 is quite a bit faster (but still slower than MXNet)
DL Library | Images/s GPU | Images/s CPU |
---|---|---|
ONNX_Caffe2 | 74.9 | 6.4 |
Caffe2 | 68.1 | 5.8 |
MXNet | 139.1 | 35.0 |
@ilkarman Do you have the notebooks for those tests somewhere?
I'm trying to generate a 2048 feature-vector from the penultimate layer of a trained ResNet50 model; like this Keras example:
model = ResNet50(include_top=False)
However, I can't see a way to do this in Caffe2 without running all the forward passes (including final softmax) and then extracting the relevant blob.My issue is that: running the full network through seems to hit inference speed bad and my test example (on CPU) takes the same time as the Keras model I tried (and slower than CNTK or TensorFlow). Is there a way to kill the last few layers?
I'm basically trying to get the fastest inference time possible (up to the feature vector) on CPU.
Edit:
I've tried another method of loading and running inference but it is still not any faster: