LdDl / go-darknet

Go bindings for Darknet (YOLO v4 / v7-tiny / v3)
Apache License 2.0
82 stars 19 forks source link

[BUG] yolov4 tiny example produces fatal error: unexpected signal during runtime execution #30

Closed Evert-Arends closed 1 year ago

Evert-Arends commented 1 year ago

Running the example like stated in the readme will produce a "fatal error: unexpected signal during runtime execution.

To Reproduce cd cmd/examples ./download_data_v4_tiny.sh go build -o base_example/main base_example/main.go && ./base_example/main --configFile=yolov4-tiny.cfg --weightsFile=yolov4-tiny.weights --imageFile=sample.jpg

go version: go version go1.19 linux/amd64 Expected behavior Output as expected

Describe the solution you'd like and provide pseudocode examples if you can Would love to see it working again, a lot of love for this project.

Evert-Arends commented 1 year ago

stacktrace:

`go build -o base_example/main base_example/main.go && ./base_example/main --configFile=yolov4-tiny.cfg --weightsFile=yolov4-tiny.weights --imageFile=sample.jpg go: downloading github.com/edsrzf/mmap-go v1.1.0 Try to load cfg: yolov4-tiny.cfg, clear = 0 0 : compute_capability = 860, cudnn_half = 0, GPU: NVIDIA GeForce RTX 3090 net.optimized_memory = 0 mini_batch = 64, batch = 64, time_steps = 1, train = 1 layer filters size/strd(dil) input output 0 Create CUDA-stream - 0 Create cudnn-handle 0 conv 32 3 x 3/ 2 416 x 416 x 3 -> 208 x 208 x 32 0.075 BF 1 conv 64 3 x 3/ 2 208 x 208 x 32 -> 104 x 104 x 64 0.399 BF 2 conv 64 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 64 0.797 BF 3 route 2 1/2 -> 104 x 104 x 32 4 conv 32 3 x 3/ 1 104 x 104 x 32 -> 104 x 104 x 32 0.199 BF 5 conv 32 3 x 3/ 1 104 x 104 x 32 -> 104 x 104 x 32 0.199 BF 6 route 5 4 -> 104 x 104 x 64 7 conv 64 1 x 1/ 1 104 x 104 x 64 -> 104 x 104 x 64 0.089 BF 8 route 2 7 -> 104 x 104 x 128 9 max 2x 2/ 2 104 x 104 x 128 -> 52 x 52 x 128 0.001 BF 10 conv 128 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 128 0.797 BF 11 route 10 1/2 -> 52 x 52 x 64 12 conv 64 3 x 3/ 1 52 x 52 x 64 -> 52 x 52 x 64 0.199 BF 13 conv 64 3 x 3/ 1 52 x 52 x 64 -> 52 x 52 x 64 0.199 BF 14 route 13 12 -> 52 x 52 x 128 15 conv 128 1 x 1/ 1 52 x 52 x 128 -> 52 x 52 x 128 0.089 BF 16 route 10 15 -> 52 x 52 x 256 17 max 2x 2/ 2 52 x 52 x 256 -> 26 x 26 x 256 0.001 BF 18 conv 256 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 256 0.797 BF 19 route 18 1/2 -> 26 x 26 x 128 20 conv 128 3 x 3/ 1 26 x 26 x 128 -> 26 x 26 x 128 0.199 BF 21 conv 128 3 x 3/ 1 26 x 26 x 128 -> 26 x 26 x 128 0.199 BF 22 route 21 20 -> 26 x 26 x 256 23 conv 256 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 256 0.089 BF 24 route 18 23 -> 26 x 26 x 512 25 max 2x 2/ 2 26 x 26 x 512 -> 13 x 13 x 512 0.000 BF 26 conv 512 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x 512 0.797 BF 27 conv 256 1 x 1/ 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF 28 conv 512 3 x 3/ 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BF 29 conv 255 1 x 1/ 1 13 x 13 x 512 -> 13 x 13 x 255 0.044 BF 30 yolo [yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05 nms_kind: greedynms (1), beta = 0.600000 31 route 27 -> 13 x 13 x 256 32 conv 128 1 x 1/ 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BF 33 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128 34 route 33 23 -> 26 x 26 x 384 35 conv 256 3 x 3/ 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BF 36 conv 255 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 255 0.088 BF 37 yolo [yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05 nms_kind: greedynms (1), beta = 0.600000 Unused field: 'names = coco.names' Total BFLOPS 6.910 avg_outputs = 310203 Allocate additional workspace_size = 535.18 MB Try to load weights: yolov4-tiny.weights Loading weights from yolov4-tiny.weights... seen 64, trained: 0 K-images (0 Kilo-batches_64) Done! Loaded 38 layers from weights-file Loaded - names_list: coco.names, classes = 80 fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x797c8340 pc=0x7f9e7a6eddb0]

runtime stack: runtime.throw({0x506543?, 0x0?}) /usr/lib/go/src/runtime/panic.go:1047 +0x5d fp=0x7ffc3bd4ca78 sp=0x7ffc3bd4ca48 pc=0x4353bd runtime.sigpanic() /usr/lib/go/src/runtime/signal_unix.go:819 +0x369 fp=0x7ffc3bd4cac8 sp=0x7ffc3bd4ca78 pc=0x449529

goroutine 1 [syscall]: runtime.cgocall(0x4d1360, 0xc000085b98) /usr/lib/go/src/runtime/cgocall.go:158 +0x5c fp=0xc000085b70 sp=0xc000085b38 pc=0x40565c github.com/LdDl/go-darknet._Cfunc_perform_network_detect(0x2652500, 0xc000126010, 0x50, 0x3e800000, 0x3f000000, 0x3ee66666, 0x0) _cgo_gotypes.go:920 +0x4e fp=0xc000085b98 sp=0xc000085b70 pc=0x4b434e github.com/LdDl/go-darknet.(YOLONetwork).Detect.func1(0x219?, 0xc000126000) /tmp/go-darknet/network.go:151 +0xe9 fp=0xc000085c18 sp=0xc000085b98 pc=0x4b5e69 github.com/LdDl/go-darknet.(YOLONetwork).Detect(0xc000085e30, 0xc000120080?) `

LdDl commented 1 year ago

Hi! I'll take a look soon

LdDl commented 1 year ago

@Evert-Arends I do confirm fatal error.

image

It happens for me on CUDA-based Darknet installation, but not when it just CPU (make install_darknet) image

Need more time to investigate and play with different CUDA versions. Can you test CPU-version to confirm that there is GPU problem only?

p.s. My current CUDA is 11.7. The first thing I'll do: re-install CUDA / downgrade version

Fri Feb  3 11:56:34 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   46C    P8    14W / 170W |    917MiB / 12288MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
LdDl commented 1 year ago
  1. I've downgraded NVidia driver image

  2. Fully removed CUDA/cuDNN

  3. Installed CUDA 11.7.0 (notice: NOT v11.7.1) without driver updates by disabling such option in *.run file, since I do not prefer to install CUDA with package manager image

  4. Installed cuDNN: cudnn-linux-x86_64-8.6.0.163

Now it works: image

Can you make same experiment?

Evert-Arends commented 1 year ago

I'll have a look tonight / saturday, I do get my cuda from a package manager, but I can easily change that I suppose. Thanks for responding this quickly!