when I run "python2 tools/train_net.py --cfg configs/DensePose_ResNet50_FPN_s1x-e2e.yaml OUTPUT_DIR tmp/detectron-output"
INFO train.py: 123: Building model: generalized_rcnn
WARNING cnn.py: 25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp.
WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp.
WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp.
WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp.
WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp.
WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp.
WARNING memonger.py: 55: NOTE: Executing memonger to optimize gradient memory
[I memonger.cc:236] Remapping 110 using 26 shared blobs.
INFO memonger.py: 97: Memonger memory optimization took 0.0266439914703 secs
WARNING memonger.py: 55: NOTE: Executing memonger to optimize gradient memory
[I memonger.cc:236] Remapping 110 using 26 shared blobs.
INFO memonger.py: 97: Memonger memory optimization took 0.036064863205 secs
WARNING workspace.py: 218: Original python traceback for operator 181 in network generalized_rcnn_init in exception above (most recent call last):
WARNING workspace.py: 223: File "tools/train_net.py", line 119, in
WARNING workspace.py: 223: File "tools/train_net.py", line 101, in main
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/utils/train.py", line 45, in train_model
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/utils/train.py", line 124, in create_model
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/model_builder.py", line 119, in create
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/model_builder.py", line 84, in generalized_rcnn
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/model_builder.py", line 233, in build_generic_detection_model
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/optimizer.py", line 32, in build_data_parallel_model
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/optimizer.py", line 55, in _build_forward_graph
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/model_builder.py", line 200, in _single_gpu_build_func
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/model_builder.py", line 258, in _add_fast_rcnn_head
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/fast_rcnn_heads.py", line 105, in add_roi_2mlp_head
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/cnn.py", line 134, in FC
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/brew.py", line 108, in scope_wrapper
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/helpers/fc.py", line 58, in fc
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/helpers/fc.py", line 37, in _FC_or_packed_FC
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/model_helper.py", line 216, in create_param
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/modeling/initializers.py", line 30, in create_param
Traceback (most recent call last):
File "tools/train_net.py", line 119, in
main()
File "tools/train_net.py", line 101, in main
checkpoints = detectron.utils.train.train_model()
File "/home/sunjunyao/code/Model/densepose/detectron/utils/train.py", line 45, in train_model
model, weights_file, start_iter, checkpoints, output_dir = create_model()
File "/home/sunjunyao/code/Model/densepose/detectron/utils/train.py", line 128, in create_model
workspace.RunNetOnce(model.param_init_net)
File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 232, in RunNetOnce
StringifyProto(net),
File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 211, in CallWithExceptionIntercept
return func(*args, *kwargs)
RuntimeError: [enforce fail at math_gpu.cu:1592] status == CURAND_STATUS_SUCCESS. 102 vs 0. Error at: /opt/conda/conda-bld/pytorch_1556653000816/work/caffe2/utils/math_gpu.cu:1592: CURAND_STATUS_ALLOCATION_FAILED
Error from operator:
output: "gpu_0/fc6_w" name: "" type: "XavierFill" arg { name: "shape" ints: 1024 ints: 12544 } device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const) + 0x59 (0x7f53d9258409 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: void caffe2::math::RandUniform<float, caffe2::CUDAContext>(unsigned long, float, float, float, caffe2::CUDAContext) + 0x4a7 (0x7f536b05beb7 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x2a3933a (0x7f536b1ef33a in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x2a3a943 (0x7f536b1f0943 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: + 0x13cb9a5 (0x7f5369b819a5 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::SimpleNet::Run() + 0x161 (0x7f53c7a22aa1 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #6: caffe2::Workspace::RunNetOnce(caffe2::NetDef const&) + 0x2b (0x7f53c7a593ab in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #7: + 0x5227f (0x7f53d995627f in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #8: + 0x8d2e8 (0x7f53d99912e8 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #26: __libc_start_main + 0xf0 (0x7f53e3685830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #27: + 0x107f (0x55e91785b07f in python2)
How to solve this problem?
when I run "python2 tools/train_net.py --cfg configs/DensePose_ResNet50_FPN_s1x-e2e.yaml OUTPUT_DIR tmp/detectron-output"
INFO train.py: 123: Building model: generalized_rcnn WARNING cnn.py: 25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information. WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp. WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp. WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp. WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp. WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp. WARNING model_helper.py: 442: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp. WARNING memonger.py: 55: NOTE: Executing memonger to optimize gradient memory [I memonger.cc:236] Remapping 110 using 26 shared blobs. INFO memonger.py: 97: Memonger memory optimization took 0.0266439914703 secs WARNING memonger.py: 55: NOTE: Executing memonger to optimize gradient memory [I memonger.cc:236] Remapping 110 using 26 shared blobs. INFO memonger.py: 97: Memonger memory optimization took 0.036064863205 secs WARNING workspace.py: 218: Original python traceback for operator
WARNING workspace.py: 223: File "tools/train_net.py", line 101, in main
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/utils/train.py", line 45, in train_model
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/utils/train.py", line 124, in create_model
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/model_builder.py", line 119, in create
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/model_builder.py", line 84, in generalized_rcnn
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/model_builder.py", line 233, in build_generic_detection_model
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/optimizer.py", line 32, in build_data_parallel_model
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/optimizer.py", line 55, in _build_forward_graph
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/model_builder.py", line 200, in _single_gpu_build_func
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/model_builder.py", line 258, in _add_fast_rcnn_head
WARNING workspace.py: 223: File "/home/sunjunyao/code/Model/densepose/detectron/modeling/fast_rcnn_heads.py", line 105, in add_roi_2mlp_head
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/cnn.py", line 134, in FC
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/brew.py", line 108, in scope_wrapper
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/helpers/fc.py", line 58, in fc
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/helpers/fc.py", line 37, in _FC_or_packed_FC
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/model_helper.py", line 216, in create_param
WARNING workspace.py: 223: File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/modeling/initializers.py", line 30, in create_param
Traceback (most recent call last):
File "tools/train_net.py", line 119, in
main()
File "tools/train_net.py", line 101, in main
checkpoints = detectron.utils.train.train_model()
File "/home/sunjunyao/code/Model/densepose/detectron/utils/train.py", line 45, in train_model
model, weights_file, start_iter, checkpoints, output_dir = create_model()
File "/home/sunjunyao/code/Model/densepose/detectron/utils/train.py", line 128, in create_model
workspace.RunNetOnce(model.param_init_net)
File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 232, in RunNetOnce
StringifyProto(net),
File "/home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 211, in CallWithExceptionIntercept
return func(*args, *kwargs)
RuntimeError: [enforce fail at math_gpu.cu:1592] status == CURAND_STATUS_SUCCESS. 102 vs 0. Error at: /opt/conda/conda-bld/pytorch_1556653000816/work/caffe2/utils/math_gpu.cu:1592: CURAND_STATUS_ALLOCATION_FAILED
Error from operator:
output: "gpu_0/fc6_w" name: "" type: "XavierFill" arg { name: "shape" ints: 1024 ints: 12544 } device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const) + 0x59 (0x7f53d9258409 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: void caffe2::math::RandUniform<float, caffe2::CUDAContext>(unsigned long, float, float, float, caffe2::CUDAContext) + 0x4a7 (0x7f536b05beb7 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x2a3933a (0x7f536b1ef33a in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x2a3a943 (0x7f536b1f0943 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: + 0x13cb9a5 (0x7f5369b819a5 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::SimpleNet::Run() + 0x161 (0x7f53c7a22aa1 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #6: caffe2::Workspace::RunNetOnce(caffe2::NetDef const&) + 0x2b (0x7f53c7a593ab in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #7: + 0x5227f (0x7f53d995627f in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #8: + 0x8d2e8 (0x7f53d99912e8 in /home/sunjunyao/anaconda3/envs/python2/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
181
in networkgeneralized_rcnn_init
in exception above (most recent call last): WARNING workspace.py: 223: File "tools/train_net.py", line 119, in