SIGABRT while running extract features

abhilashi commented 6 years ago

ubuntu@ip-172-31-14-53:~/R2Plus1D$ python tools/extract_features.py --test_data=dupes_data --model_name=r2plus1d --model_depth=34 --clip_length_rgb=32 --gpus=0,1 --batch_size=4 --load_model_path=./trained/r2.5d_d34_l32_ft_sports1m.pkl --output_path=my_features.pkl --features=softmax,final_avg,video_id --sanity_check=0 --get_video_id=1 --use_local_file=1 --num_labels=400
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/sparse/lil.py:19: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import _csparsetools
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/sparse/csgraph/__init__.py:165: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._shortest_path import shortest_path, floyd_warshall, dijkstra,\
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/sparse/csgraph/_validation.py:5: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._tools import csgraph_to_dense, csgraph_from_dense,\
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/sparse/csgraph/__init__.py:167: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._traversal import breadth_first_order, depth_first_order, \
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/sparse/csgraph/__init__.py:169: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._min_spanning_tree import minimum_spanning_tree
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/sparse/csgraph/__init__.py:170: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._reordering import reverse_cuthill_mckee, maximum_bipartite_matching, \
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/spatial/__init__.py:95: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .ckdtree import *
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/spatial/__init__.py:96: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .qhull import *
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/spatial/_spherical_voronoi.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import _voronoi
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/spatial/distance.py:122: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import _hausdorff
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/linalg/basic.py:17: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._solve_toeplitz import levinson
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/linalg/__init__.py:207: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._decomp_update import *
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/special/__init__.py:640: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._ufuncs import *
/home/ubuntu/.local/lib/python2.7/site-packages/scipy/special/_ellip_harm.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._ellip_harm_2 import _ellipsoid, _ellipsoid_norm
Ignoring @/caffe2/caffe2/contrib/nccl:nccl_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops_gpu as it is not a valid file.
E0813 11:33:53.224505 16124 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0813 11:33:53.224771 16124 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0813 11:33:53.224789 16124 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Namespace(batch_size=4, clip_length_of=8, clip_length_rgb=32, clip_per_video=1, crop_size=112, db_type='pickle', decode_type=2, do_flow_aggregation=0, features='softmax,final_avg,video_id', flow_data_type=0, frame_gap_of=2, get_video_id=1, gpus='0,1', input_type=0, load_model_path='./trained/r2.5d_d34_l32_ft_sports1m.pkl', model_depth=34, model_name='r2plus1d', num_channels=3, num_decode_threads=4, num_iterations=-1, num_labels=400, output_path='my_features.pkl', sampling_rate_of=2, sampling_rate_rgb=1, sanity_check=0, scale_h=128, scale_w=171, test_data='dupes_data', use_cudnn=1, use_local_file=1)
INFO:feature_extractor:Namespace(batch_size=4, clip_length_of=8, clip_length_rgb=32, clip_per_video=1, crop_size=112, db_type='pickle', decode_type=2, do_flow_aggregation=0, features='softmax,final_avg,video_id', flow_data_type=0, frame_gap_of=2, get_video_id=1, gpus='0,1', input_type=0, load_model_path='./trained/r2.5d_d34_l32_ft_sports1m.pkl', model_depth=34, model_name='r2plus1d', num_channels=3, num_decode_threads=4, num_iterations=-1, num_labels=400, output_path='my_features.pkl', sampling_rate_of=2, sampling_rate_rgb=1, sanity_check=0, scale_h=128, scale_w=171, test_data='dupes_data', use_cudnn=1, use_local_file=1)
INFO:model_builder:Validated: r2plus1d with 34 layers
INFO:model_builder:with input 32x112x112
Running on GPUs: [0, 1]
INFO:feature_extractor:Running on GPUs: [0, 1]
WARNING:root:[====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
WARNING:data_parallel_model:** Only 1 GPUs available, GPUs [0, 1] requested
INFO:data_parallel_model:Parallelizing model for devices: [0, 1]
INFO:data_parallel_model:Create input and model training operators
WARNING:data_parallel_model:
WARNING:data_parallel_model:############# WARNING #############
WARNING:data_parallel_model:Model Extract Features/<caffe2.python.cnn.CNNModelHelper object at 0x7fded641ff90> is used for testing/validation but
WARNING:data_parallel_model:has init_params=True!
WARNING:data_parallel_model:This can conflict with model training.
WARNING:data_parallel_model:Please ensure model = ModelHelper(init_params=False)
WARNING:data_parallel_model:####################################
WARNING:data_parallel_model:
INFO:data_parallel_model:Model for GPU : 0
INFO:model_helper:outputing rgb data
INFO:model_builder:creating r2plus1d, depth=34...
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 230
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 460
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 921
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:data_parallel_model:Model for GPU : 1
INFO:model_helper:outputing rgb data
INFO:model_builder:creating r2plus1d, depth=34...
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 230
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 460
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 921
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:data_parallel_model:Parameter update function not defined --> only forward
terminate called without an active exception
*** Aborted at 1534160033 (unix time) try "date -d @1534160033" if you are using GNU date ***
PC: @     0x7fdf2acba428 gsignal
*** SIGABRT (@0x3e800003efc) received by PID 16124 (TID 0x7fdf2b473700) from PID 16124; stack trace: ***
    @     0x7fdf2b060390 (unknown)
    @     0x7fdf2acba428 gsignal
    @     0x7fdf2acbc02a abort
    @     0x7fdf1cb4684d __gnu_cxx::__verbose_terminate_handler()
    @     0x7fdf1cb446b6 (unknown)
    @     0x7fdf1cb44701 std::terminate()
    @     0x7fdf19267b00 caffe2::CUDAContext::~CUDAContext()
    @     0x7fdf19724c6e caffe2::FillerOp<>::~FillerOp()
    @     0x7fdf19770227 caffe2::MSRAFillOp<>::~MSRAFillOp()
    @     0x7fdf1b41df99 std::vector<>::~vector()
    @     0x7fdf1b43053f caffe2::SimpleNet::SimpleNet()
    @     0x7fdf1b431c8e _ZN6caffe210RegistererINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10unique_ptrINS_7NetBaseESt14default_deleteIS8_EEJRKSt10shared_ptrIKNS_6NetDefEEPNS_9WorkspaceEEE14DefaultCreatorINS_9SimpleNetEEESB_SH_SJ_
    @     0x7fdf1b3a9203 std::_Function_handler<>::_M_invoke()
    @     0x7fdf1b3d9bf2 caffe2::CreateNet()
    @     0x7fdf1b3da61d caffe2::CreateNet()
    @     0x7fdf1b3f9f12 caffe2::Workspace::RunNetOnce()
    @     0x7fdf1c0d4d08 _ZZN6caffe26python16addGlobalMethodsERN8pybind116moduleEENKUlRKNS1_5bytesEE28_clES6_.isra.3053.constprop.3166
    @     0x7fdf1c0d4eb4 _ZZN8pybind1112cpp_function10initializeIZN6caffe26python16addGlobalMethodsERNS_6moduleEEUlRKNS_5bytesEE28_bJS8_EJNS_4nameENS_5scopeENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNESQ_
    @     0x7fdf1c10781e pybind11::cpp_function::dispatcher()
    @           0x4c30ce PyEval_EvalFrameEx
    @           0x4b9ab6 PyEval_EvalCodeEx
    @           0x4c1e6f PyEval_EvalFrameEx
    @           0x4b9ab6 PyEval_EvalCodeEx
    @           0x4c1e6f PyEval_EvalFrameEx
    @           0x4b9ab6 PyEval_EvalCodeEx
    @           0x4c1e6f PyEval_EvalFrameEx
    @           0x4b9ab6 PyEval_EvalCodeEx
    @           0x4c1e6f PyEval_EvalFrameEx
    @           0x4b9ab6 PyEval_EvalCodeEx
    @           0x4eb30f (unknown)
    @           0x4e5422 PyRun_FileExFlags
    @           0x4e3cd6 PyRun_SimpleFileExFlags
Aborted (core dumped)

Anyone else who has seen this issue? CC @dutran

abhilashi commented 6 years ago

It works well when I specify a single GPU.

python tools/extract_features.py --test_data=dupes_data --model_name=r2plus1d --model_depth=34 --clip_length_rgb=32 --gpus=0 --batch_size=4 --load_model_path=./trained/r2.5d_d34_l32_ft_sports1m.pkl --output_path=my_features.pkl --features=softmax,final_avg,video_id --sanity_check=0 --get_video_id=1 --use_local_file=1 --num_labels=400

...
[swscaler @ 0x7e90e4162300] Warning: data is not aligned! This can lead to a speedloss
0/1 iterations
INFO:feature_extractor:0/1 iterations
Read 'softmax' with shape (4, 400)
INFO:feature_extractor:Read 'softmax' with shape (4, 400)
Read 'final_avg' with shape (4, 512, 1, 1, 1)
INFO:feature_extractor:Read 'final_avg' with shape (4, 512, 1, 1, 1)
Read 'video_id' with shape (4,)
INFO:feature_extractor:Read 'video_id' with shape (4,)
Writing to my_features.pkl

dutran commented 5 years ago

seems like memory issue.

facebookresearch / VMZ

SIGABRT while running extract features #36