Closed IbrahimBond closed 4 years ago
./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg
CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV version: 4.2.0
0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF
i am not using a darknet commands. i do inference in python using the wrapper provided in your repo as a base.
another thing i have in mind is that i am having two threads do inference using the same darknet model instance. could this be the cause of the issue?
model info:
Try to load cfg: /home/zbox/Desktop/darknet_models/ocr-config/ocr-net.cfg, weights: /home/zbox/Desktop/darknet_models/ocr-config/ocr-net_last.weights, clear = 0
0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 64, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 240 x 160 x 3 -> 240 x 160 x 32 0.066 BF
1 max 2x 2/ 2 240 x 160 x 32 -> 120 x 80 x 32 0.001 BF
2 conv 64 3 x 3/ 1 120 x 80 x 32 -> 120 x 80 x 64 0.354 BF
3 max 2x 2/ 2 120 x 80 x 64 -> 60 x 40 x 64 0.001 BF
4 conv 128 3 x 3/ 1 60 x 40 x 64 -> 60 x 40 x 128 0.354 BF
5 conv 64 1 x 1/ 1 60 x 40 x 128 -> 60 x 40 x 64 0.039 BF
6 conv 128 3 x 3/ 1 60 x 40 x 64 -> 60 x 40 x 128 0.354 BF
7 max 2x 2/ 2 60 x 40 x 128 -> 30 x 20 x 128 0.000 BF
8 conv 256 3 x 3/ 1 30 x 20 x 128 -> 30 x 20 x 256 0.354 BF
9 conv 128 1 x 1/ 1 30 x 20 x 256 -> 30 x 20 x 128 0.039 BF
10 conv 256 3 x 3/ 1 30 x 20 x 128 -> 30 x 20 x 256 0.354 BF
11 conv 512 3 x 3/ 1 30 x 20 x 256 -> 30 x 20 x 512 1.416 BF
12 conv 256 3 x 3/ 1 30 x 20 x 512 -> 30 x 20 x 256 1.416 BF
13 conv 512 3 x 3/ 1 30 x 20 x 256 -> 30 x 20 x 512 1.416 BF
14 conv 30 1 x 1/ 1 30 x 20 x 512 -> 30 x 20 x 30 0.018 BF
15 detection
mask_scale: Using default '1.000000'
Total BFLOPS 6.182
avg_outputs = 271050
Allocate additional workspace_size = 13.11 MB
Warning: width=240 and height=160 in cfg-file must be divisible by 32 for default networks Yolo v1/v2/v3!!!
Try to load weights: /home/zbox/Desktop/darknet_models/ocr-config/ocr-net_last.weights
Loading weights from /home/zbox/Desktop/darknet_models/ocr-config/ocr-net_last.weights...
seen 64, trained: 8003 K-images (125 Kilo-batches_64)
Done! Loaded 16 layers from weights-file
Loaded - names_list: /home/zbox/Desktop/darknet_models/ocr-config/ocr-net.names, classes = 10
another thing i have in mind is that i am having two threads do inference using the same darknet model instance. could this be the cause of the issue?
Yes. It isn't thread-safe currently.
what do you recommend in this case? i need this because my use case depends on concurrency. init two instances of the same model?
also for my benefit and maybe others, could you please if you have the time elaborate on why it is not thread safe and how it causes it to produce this issue? and what kind of resources do you recommend to be able to debug these kind of issues.
thanks.
Use mutex, so only one thread can run detector.
hi,
i am getting CUDA Eror: misaligned address in the following traceback as you can see:
CUDA status Error: file: ./src/dark_cuda.c : () : line: 477 : build time: Jun 8 2020 - 09:58:01
CUDA Error: misaligned address python3: : Unknown error -2044391453
cuDNN status Error in: file: ./src/convolutional_kernels.cu : () : line: 470 : build time: Jun 8 2020 - 09:58:17
cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED python3: : Unknown error -2044391486 ^CError in atexit._run_exitfuncs: Process Process-2: Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll pid, sts = os.waitpid(self.pid, flag) KeyboardInterrupt Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/home/zbox/Desktop/test2/tracker.py", line 61, in start_tracking while vs.running(): File "/home/zbox/Desktop/test2/utils/video_stream.py", line 120, in running return self.more() or not self.stopped File "/home/zbox/Desktop/test2/utils/video_stream.py", line 126, in more time.sleep(0.1) KeyboardInterrupt
i only get this error after i have ran the detector for sometime(some tests ran for hours without a crash).
I also think it is important to note that i am running multiple models in different python processes and threads.
I encountered this error on gtx 2060 GPU and jetson tx-2.
Any help is appreciated. @AlexeyAB