facebookarchive / caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework.
https://caffe2.ai
Apache License 2.0
8.42k stars 1.94k forks source link

caffe2 resnet50_train.py error #1623

Open thyeros opened 6 years ago

thyeros commented 6 years ago

I updated to the latest caffe2 and rebuild with CUDA9+CUDNN7. Since then, I'm getting shape inference error with the following command (which ran fine previously). How do I fix this error?

mpirun -n 1 -host localhost python resnet50_trainer.py --train_data=/imagenet/ilsvrc12_train_lmdb --num_shards=1 --shard=0 --file_store_path=/tmp --batch_size=64 --epoch_size=256 --base_learning_rate=0.02 --image_size 224 --num_epochs 1

INFO:data_parallel_model:Add gradient all-reduces for SyncSGD INFO:data_parallel_model:Post-iteration operators for updating params INFO:data_parallel_model:Calling optimizer builder function INFO:data_parallel_model:Add initial parameter sync E1214 09:44:18.621101 14658 operator.cc:461] Shape inference error: [enforce fail at conv_pool_op_base.h:554] in_size + pad_head + pad_tail >= dkernel. 2 vs 3 E1214 09:44:18.621856 14658 operator.cc:462] Operator: input: "gpu_0/conv1_spatbn_relu" output: "gpu_0/pool1" name: "" type: "MaxPool" arg { name: "order" s: "NCHW" } arg { name: "kernel" i: 3 } arg { name: "stride" i: 2 } arg { name: "ws_nbytes_limit" i: 67108864 } arg { name: "cudnn_exhaustive_search" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN" E1214 09:44:18.621883 14658 operator.cc:463] Returning empty results. WARNING:memonger:NOTE: Executing memonger to optimize gradient memory INFO:memonger:Memonger memory optimization took 0.0969099998474 secs INFO:resnet50_trainer:Starting epoch 0/1

blateyang commented 6 years ago

I have met the same error. Have you resolved it?@thyeros b.t.w, I also have some other warnings before this error as follows:

Ignoring @/caffe2/caffe2/contrib/nccl:nccl_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops_gpu as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:file_store_handler_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:redis_store_handler_ops as it is not a valid file.

Do you have the similar warnings?