Tencent / PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.
https://pocketflow.github.io
Other
2.79k stars 490 forks source link

failed to run cuBLAS routine cublasSgemv_v2: CUBLAS_STATUS_EXECUTION_FAILED #295

Open betterhalfwzm opened 5 years ago

betterhalfwzm commented 5 years ago

Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10166 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:06:00.0, compute capability: 7.5) INFO:tensorflow:Restoring parameters from ./models/model.ckpt-97656 INFO:tensorflow:layer #0: pr = 0.00 (target) INFO:tensorflow:kernel name = pruned_model/resnet_model/conv2d/kernel/read:0 INFO:tensorflow:kernel shape = (3, 3, 3, 16) INFO:tensorflow:sampling inputs & outputs through multiple mini-batches INFO:tensorflow:time elapsed (sampling): 1.9460 (s) INFO:tensorflow:choosing channels via solving the sparsity-constrained regression problem INFO:tensorflow:[sparse regression] INFO:tensorflow: inputs: (50000, 9) / outputs: (50000, 16) / conv_krnl: (3, 3, 3, 16) / pr: 0.0 / nnz: 3 INFO:tensorflow:computing the feature matrix & response vector INFO:tensorflow:secondary sampling: 50000 -> 31250 INFO:tensorflow:time elapsed: 0.0278 (s) INFO:tensorflow:computing <X^T X> & <X^T y> in advance INFO:tensorflow:time elapsed: 0.0098 (s) INFO:tensorflow:determining 's upper bound 2019-06-12 11:07:52.615793: E tensorflow/stream_executor/cuda/cuda_blas.cc:647] failed to run cuBLAS routine cublasSgemv_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call return fn(*args) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: Blas xGEMV launch failed : a.shape=[1,3,3], b.shape=[1,3,1], m=3, n=1, k=3 [[Node: meta_lasso/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](meta_lasso/xt_x/read, meta_lasso/mask/read)]] [[Node: meta_lasso/Assign/_219 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_39_meta_lasso/Assign", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 69, in tf.app.run() File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "main.py", line 55, in main learner.train() File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 152, in train self.choose_channels() File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 637, in __choose_channels inputs_np_list, outputs_np, conv_krnl_prnd, prune_ratio) File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 792, in solve_sparse_regression mask_np, nb_chns_nnz = __solve_lasso(ubnd) File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 781, in __solve_lasso self.sess_prune.run(self.meta_lasso['train_op'], feed_dict={self.meta_lasso['gamma']: x}) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run run_metadata_ptr) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run feed_dict_tensor, options, run_metadata) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run run_metadata) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Blas xGEMV launch failed : a.shape=[1,3,3], b.shape=[1,3,1], m=3, n=1, k=3 [[Node: meta_lasso/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](meta_lasso/xt_x/read, meta_lasso/mask/read)]] [[Node: meta_lasso/Assign/_219 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_39_meta_lasso/Assign", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'meta_lasso/MatMul', defined at: File "main.py", line 69, in tf.app.run() File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "main.py", line 51, in main learner = create_learner(sm_writer, model_helper) File "/home/wangzhaoming/PocketFlow/learners/learner_utils.py", line 54, in create_learner learner = ChannelPrunedRmtLearner(sm_writer, model_helper) File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 144, in init self.__build_prune() File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 385, in build_prune self.meta_lasso = self.build_meta_lasso() File "/home/wangzhaoming/PocketFlow/learners/channel_pruning_rmt/learner.py", line 451, in build_meta_lasso mask_gd = mask - FLAGS.cpr_ista_lrn_rate (tf.matmul(xt_x, mask) - xt_y) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1980, in matmul a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1236, in batch_mat_mul "BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func return func(args, **kwargs) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op op_def=op_def) File "/home/wangzhaoming/anaconda3/envs/Pocketflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init__ self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas xGEMV launch failed : a.shape=[1,3,3], b.shape=[1,3,1], m=3, n=1, k=3 [[Node: meta_lasso/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](meta_lasso/xt_x/read, meta_lasso/mask/read)]] [[Node: meta_lasso/Assign/_219 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_39_meta_lasso/Assign", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]