I have been training a Siamese variant of a ResNet-50 inside a docker container, using CUDA 8 and CUDNN 5, from pre-trained imagenet weights, with the following command:
This has been working up until today, when I rebuilt my docker image. Docker clones the latest version of Caffe from the official Github page and builds it when building the image. Since rebuilding the docker image, running the above command gives the following error durring Caffe's initialization (with some lines of the preceding output for context).
layer {
name: "res4c_branch2a_relu"
type: "ReLU"
bottom: "res4c_branch2a"
top: "res4c_branch2a"
}
layer {
name: "res4c_branch2b"
I0818 05:53:52.053457 48 layer_factory.hpp:77] Creating layer Data
Traceback (most recent call last):
File "/srv/same_car_classifier/caffe_layers/SiameseDataLayer.py", line 1, in
import caffe
File "/opt/caffe/python/caffe/init.py", line 1, in
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
File "/opt/caffe/python/caffe/pycaffe.py", line 15, in
import caffe.io
File "/opt/caffe/python/caffe/io.py", line 8, in
from caffe.proto import caffe_pb2
File "/opt/caffe/python/caffe/proto/caffe_pb2.py", line 11, in
from google.protobuf import descriptor_pb2
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor_pb2.py", line 256, in
options=None),
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 501, in new
return _message.default_pool.FindFieldByName(full_name)
KeyError: 'Unknown descriptor pool'
No changes were made to the data layer between when it worked and now, nor to any of the solver.prototxt or net-definition prototxt files. Just to be sure, I rolled back my code (including the Dockerfile) to a revision from a time when this was definitely working, and rebuilt the Docker image; the same bug still occurs.
I'm not 100% certain that this is a Caffe issue, but it appears to be.
Steps to reproduce
Your system configuration
Operating system: Ubuntu 14.04
Compiler: gcc
CUDA version (if applicable): 8
CUDNN version (if applicable): 5
BLAS: atlas
Python version: 2.7
Issue summary
I have been training a Siamese variant of a ResNet-50 inside a docker container, using CUDA 8 and CUDNN 5, from pre-trained imagenet weights, with the following command:
/opt/caffe/build/tools/caffe train -solver $SOLVER -weights $WEIGHTS
This has been working up until today, when I rebuilt my docker image. Docker clones the latest version of Caffe from the official Github page and builds it when building the image. Since rebuilding the docker image, running the above command gives the following error durring Caffe's initialization (with some lines of the preceding output for context).
layer { name: "res4c_branch2a_relu" type: "ReLU" bottom: "res4c_branch2a" top: "res4c_branch2a" } layer { name: "res4c_branch2b" I0818 05:53:52.053457 48 layer_factory.hpp:77] Creating layer Data Traceback (most recent call last): File "/srv/same_car_classifier/caffe_layers/SiameseDataLayer.py", line 1, in
import caffe
File "/opt/caffe/python/caffe/init.py", line 1, in
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
File "/opt/caffe/python/caffe/pycaffe.py", line 15, in
import caffe.io
File "/opt/caffe/python/caffe/io.py", line 8, in
from caffe.proto import caffe_pb2
File "/opt/caffe/python/caffe/proto/caffe_pb2.py", line 11, in
from google.protobuf import descriptor_pb2
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor_pb2.py", line 256, in
options=None),
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 501, in new
return _message.default_pool.FindFieldByName(full_name)
KeyError: 'Unknown descriptor pool'
No changes were made to the data layer between when it worked and now, nor to any of the solver.prototxt or net-definition prototxt files. Just to be sure, I rolled back my code (including the Dockerfile) to a revision from a time when this was definitely working, and rebuilt the Docker image; the same bug still occurs.
I'm not 100% certain that this is a Caffe issue, but it appears to be.
Steps to reproduce
Your system configuration
Operating system: Ubuntu 14.04 Compiler: gcc CUDA version (if applicable): 8 CUDNN version (if applicable): 5 BLAS: atlas Python version: 2.7