CUDA status Error file: ./src/dark_cuda.c

IbrahimBond commented 4 years ago

hi,

i am getting CUDA Eror: misaligned address in the following traceback as you can see:

CUDA status Error: file: ./src/dark_cuda.c : () : line: 477 : build time: Jun 8 2020 - 09:58:01

CUDA Error: misaligned address python3: : Unknown error -2044391453

cuDNN status Error in: file: ./src/convolutional_kernels.cu : () : line: 470 : build time: Jun 8 2020 - 09:58:17

cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED python3: : Unknown error -2044391486 ^CError in atexit._run_exitfuncs: Process Process-2: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll pid, sts = os.waitpid(self.pid, flag) KeyboardInterrupt Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/home/zbox/Desktop/test2/tracker.py", line 61, in start_tracking while vs.running(): File "/home/zbox/Desktop/test2/utils/video_stream.py", line 120, in running return self.more() or not self.stopped File "/home/zbox/Desktop/test2/utils/video_stream.py", line 126, in more time.sleep(0.1) KeyboardInterrupt

i only get this error after i have ran the detector for sometime(some tests ran for hours without a crash).

I also think it is important to note that i am running multiple models in different python processes and threads.

I encountered this error on gtx 2060 GPU and jetson tx-2.

Any help is appreciated. @AlexeyAB

AlexeyAB commented 4 years ago

What command do you use?

screenshots with such information

./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg
CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV version: 4.2.0
0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
layer   filters  size/strd(dil)      input                output
0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF

IbrahimBond commented 4 years ago

i am not using a darknet commands. i do inference in python using the wrapper provided in your repo as a base.

another thing i have in mind is that i am having two threads do inference using the same darknet model instance. could this be the cause of the issue?

model info:

Try to load cfg: /home/zbox/Desktop/darknet_models/ocr-config/ocr-net.cfg, weights: /home/zbox/Desktop/darknet_models/ocr-config/ocr-net_last.weights, clear = 0 
 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2060 
net.optimized_memory = 0 
mini_batch = 1, batch = 64, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    240 x 160 x   3 ->  240 x 160 x  32 0.066 BF
   1 max                2x 2/ 2    240 x 160 x  32 ->  120 x  80 x  32 0.001 BF
   2 conv     64       3 x 3/ 1    120 x  80 x  32 ->  120 x  80 x  64 0.354 BF
   3 max                2x 2/ 2    120 x  80 x  64 ->   60 x  40 x  64 0.001 BF
   4 conv    128       3 x 3/ 1     60 x  40 x  64 ->   60 x  40 x 128 0.354 BF
   5 conv     64       1 x 1/ 1     60 x  40 x 128 ->   60 x  40 x  64 0.039 BF
   6 conv    128       3 x 3/ 1     60 x  40 x  64 ->   60 x  40 x 128 0.354 BF
   7 max                2x 2/ 2     60 x  40 x 128 ->   30 x  20 x 128 0.000 BF
   8 conv    256       3 x 3/ 1     30 x  20 x 128 ->   30 x  20 x 256 0.354 BF
   9 conv    128       1 x 1/ 1     30 x  20 x 256 ->   30 x  20 x 128 0.039 BF
  10 conv    256       3 x 3/ 1     30 x  20 x 128 ->   30 x  20 x 256 0.354 BF
  11 conv    512       3 x 3/ 1     30 x  20 x 256 ->   30 x  20 x 512 1.416 BF
  12 conv    256       3 x 3/ 1     30 x  20 x 512 ->   30 x  20 x 256 1.416 BF
  13 conv    512       3 x 3/ 1     30 x  20 x 256 ->   30 x  20 x 512 1.416 BF
  14 conv     30       1 x 1/ 1     30 x  20 x 512 ->   30 x  20 x  30 0.018 BF
  15 detection
mask_scale: Using default '1.000000'
Total BFLOPS 6.182 
avg_outputs = 271050 
 Allocate additional workspace_size = 13.11 MB 

 Warning: width=240 and height=160 in cfg-file must be divisible by 32 for default networks Yolo v1/v2/v3!!! 

 Try to load weights: /home/zbox/Desktop/darknet_models/ocr-config/ocr-net_last.weights 
Loading weights from /home/zbox/Desktop/darknet_models/ocr-config/ocr-net_last.weights...
 seen 64, trained: 8003 K-images (125 Kilo-batches_64) 
Done! Loaded 16 layers from weights-file 
Loaded - names_list: /home/zbox/Desktop/darknet_models/ocr-config/ocr-net.names, classes = 10

AlexeyAB commented 4 years ago

another thing i have in mind is that i am having two threads do inference using the same darknet model instance. could this be the cause of the issue?

Yes. It isn't thread-safe currently.

IbrahimBond commented 4 years ago

what do you recommend in this case? i need this because my use case depends on concurrency. init two instances of the same model?

also for my benefit and maybe others, could you please if you have the time elaborate on why it is not thread safe and how it causes it to produce this issue? and what kind of resources do you recommend to be able to debug these kind of issues.
thanks.

AlexeyAB commented 4 years ago

Use mutex, so only one thread can run detector.

AlexeyAB / darknet

CUDA status Error file: ./src/dark_cuda.c #5884