chengyangfu / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
169 stars 47 forks source link

CUBLAS_STATUS_EXECUTION_FAILED #6

Open Lucien7786 opened 7 years ago

Lucien7786 commented 7 years ago

when I tried the steps follow your instructions at this point: " # Train the SSD-ResNet-101 321x321 python examples/ssd/ssd_pascal_resnet_321.py "

Error showing:

I0420 22:48:18.898177 7233 sgd_solver.cpp:138] Iteration 2520, lr = 0.001 F0421 09:02:45.161278 7233 math_functions.cu:52] Check failed: status == CUBLAS_STATUS_SUCCESS (13 vs. 0) CUBLAS_STATUS_EXECUTION_FAILED Check failure stack trace: @ 0x7f2f6e971daa (unknown) @ 0x7f2f6e971ce4 (unknown) @ 0x7f2f6e9716e6 (unknown) @ 0x7f2f6e974687 (unknown) @ 0x7f2f6f1d36a5 caffe::caffe_gpu_gemv<>() @ 0x7f2f6f17a51a caffe::BiasLayer<>::Backward_gpu() @ 0x7f2f6f193a47 caffe::ScaleLayer<>::Backward_gpu() @ 0x7f2f6f15c817 caffe::Net<>::BackwardFromTo() @ 0x7f2f6f15c981 caffe::Net<>::Backward() @ 0x7f2f6f0b4c8b caffe::Solver<>::Step() @ 0x7f2f6f0b538e caffe::Solver<>::Solve() @ 0x40b568 train() @ 0x40899c main @ 0x7f2f6d0f1f45 (unknown) @ 0x4092a3 (unknown) @ (nil) (unknown) Aborted (core dumped)

I had tried several times, but it still came out a seems random but the same core dump error at different Iteration counts(the last iteration breakpoint is "Iteration 9920"). And the current temp caffemodel is not be auto-saved when failed. The iteration is begin from 0 again. Time wasted! I found the error code on cuda toolkit docment. It seems to be cuBLAS library or Driver issue. 2.2.2. cublasStatus_t CUBLAS_STATUS_EXECUTION_FAILED The weird thing is that I had trained a ssd caffemodel(weiliu89's version, voc0712, vgg16, iter=12000) successfully on this computer several days before. Waiting for your reply. I am using tx1, the cuda version is the latest version from nvidia, cuda_8.0.61_375.26_linux.run

Here is my computer information:

Fri Apr 21 10:49:22 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN X (Pascal) Off | 0000:01:00.0 On | N/A | | 23% 33C P8 15W / 250W | 186MiB / 12188MiB | 4% Default | +-------------------------------+----------------------+----------------------+ nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61