Closed lovehuanhuan closed 2 years ago
When I use your demo " ./test_yolo4tiny", I get Segmentation fault also after "export TKDNN_MODE=FP16"
Currently, we do not support Windows, that is why no where is explained how to solve all the issues with that OS.
tkDNN is now supported on windows on the current master branch at an experimental level
Closing for inactivity. Feel free to reopen.
My system is win10 nvidiadocker rtx2080ti when I set "export TKDNN_MODE=FP16" then there is an Segmentation fault, but FP32 is ok. I have layers and debug dirs, I have only one yolo layer.
Not supported field: batch=4 Not supported field: subdivisions=6 Not supported field: momentum=0.9 Not supported field: decay=0.0005 Not supported field: angle=0 Not supported field: saturation = 1.5 Not supported field: exposure = 1.5 Not supported field: hue=.1 Not supported field: letter_box=1 Not supported field: learning_rate=0.001 Not supported field: burn_in=1000 Not supported field: max_batches = 50020 Not supported field: policy=steps Not supported field: steps=10000,20000 Not supported field: scales=.1,.1 New NETWORK (tkDNN v0.5, CUDNN v7.605) !! FP16 INFERENCE ENABLED !! Reading weights: I=3 O=32 KERNEL=3x3x1 Reading weights: I=32 O=64 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=3x3x1 Reading weights: I=32 O=32 KERNEL=3x3x1 Reading weights: I=32 O=32 KERNEL=3x3x1 Reading weights: I=32 O=32 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=512 O=512 KERNEL=3x3x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=256 O=128 KERNEL=1x1x1 Reading weights: I=384 O=256 KERNEL=3x3x1 Reading weights: I=256 O=64 KERNEL=1x1x1 Reading weights: I=192 O=128 KERNEL=3x3x1 Reading weights: I=256 O=18 KERNEL=1x1x1 Not supported field: anchors =5,7, 12,14, 19, 25 Not supported field: jitter=.3 Not supported field: cls_normalizer=1.0 Not supported field: iou_normalizer=0.07 Not supported field: iou_loss=ciou Not supported field: ignore_thresh = .7 Not supported field: truth_thresh = 1 Not supported field: random=1
====================== NETWORK MODEL ====================== N. Layer type input (HW,CH) output (HW,CH) 0 Conv2d 608 x 608, 3 -> 304 x 304, 32 1 ActivationLeaky 304 x 304, 32 -> 304 x 304, 32 2 Conv2d 304 x 304, 32 -> 152 x 152, 64 3 ActivationLeaky 152 x 152, 64 -> 152 x 152, 64 4 Conv2d 152 x 152, 64 -> 152 x 152, 64 5 ActivationLeaky 152 x 152, 64 -> 152 x 152, 64 6 Route 152 x 152, 32 -> 152 x 152, 32 7 Conv2d 152 x 152, 32 -> 152 x 152, 32 8 ActivationLeaky 152 x 152, 32 -> 152 x 152, 32 9 Conv2d 152 x 152, 32 -> 152 x 152, 32 10 ActivationLeaky 152 x 152, 32 -> 152 x 152, 32 11 Conv2d 152 x 152, 32 -> 152 x 152, 32 12 ActivationLeaky 152 x 152, 32 -> 152 x 152, 32 13 Route 152 x 152, 64 -> 152 x 152, 64 14 Conv2d 152 x 152, 64 -> 152 x 152, 64 15 ActivationLeaky 152 x 152, 64 -> 152 x 152, 64 16 Route 152 x 152, 128 -> 152 x 152, 128 17 Pooling 152 x 152, 128 -> 76 x 76, 128 18 Conv2d 76 x 76, 128 -> 76 x 76, 128 19 ActivationLeaky 76 x 76, 128 -> 76 x 76, 128 20 Route 76 x 76, 64 -> 76 x 76, 64 21 Conv2d 76 x 76, 64 -> 76 x 76, 64 22 ActivationLeaky 76 x 76, 64 -> 76 x 76, 64 23 Conv2d 76 x 76, 64 -> 76 x 76, 64 24 ActivationLeaky 76 x 76, 64 -> 76 x 76, 64 25 Route 76 x 76, 128 -> 76 x 76, 128 26 Conv2d 76 x 76, 128 -> 76 x 76, 128 27 ActivationLeaky 76 x 76, 128 -> 76 x 76, 128 28 Route 76 x 76, 256 -> 76 x 76, 256 29 Pooling 76 x 76, 256 -> 38 x 38, 256 30 Conv2d 38 x 38, 256 -> 38 x 38, 256 31 ActivationLeaky 38 x 38, 256 -> 38 x 38, 256 32 Route 38 x 38, 128 -> 38 x 38, 128 33 Conv2d 38 x 38, 128 -> 38 x 38, 128 34 ActivationLeaky 38 x 38, 128 -> 38 x 38, 128 35 Conv2d 38 x 38, 128 -> 38 x 38, 128 36 ActivationLeaky 38 x 38, 128 -> 38 x 38, 128 37 Conv2d 38 x 38, 128 -> 38 x 38, 128 38 ActivationLeaky 38 x 38, 128 -> 38 x 38, 128 39 Route 38 x 38, 256 -> 38 x 38, 256 40 Conv2d 38 x 38, 256 -> 38 x 38, 256 41 ActivationLeaky 38 x 38, 256 -> 38 x 38, 256 42 Route 38 x 38, 512 -> 38 x 38, 512 43 Pooling 38 x 38, 512 -> 19 x 19, 512 44 Conv2d 19 x 19, 512 -> 19 x 19, 512 45 ActivationLeaky 19 x 19, 512 -> 19 x 19, 512 46 Conv2d 19 x 19, 512 -> 19 x 19, 256 47 ActivationLeaky 19 x 19, 256 -> 19 x 19, 256 48 Route 19 x 19, 256 -> 19 x 19, 256 49 Conv2d 19 x 19, 256 -> 19 x 19, 128 50 ActivationLeaky 19 x 19, 128 -> 19 x 19, 128 51 Upsample 19 x 19, 128 -> 38 x 38, 128 52 Route 38 x 38, 384 -> 38 x 38, 384 53 Conv2d 38 x 38, 384 -> 38 x 38, 256 54 ActivationLeaky 38 x 38, 256 -> 38 x 38, 256 55 Conv2d 38 x 38, 256 -> 38 x 38, 64 56 ActivationLeaky 38 x 38, 64 -> 38 x 38, 64 57 Upsample 38 x 38, 64 -> 76 x 76, 64 58 Route 76 x 76, 192 -> 76 x 76, 192 59 Upsample 76 x 76, 192 -> 152 x 152, 192 60 Conv2d 152 x 152, 192 -> 152 x 152, 128 61 ActivationLeaky 152 x 152, 128 -> 152 x 152, 128 62 Route 152 x 152, 256 -> 152 x 152, 256 63 Conv2d 152 x 152, 256 -> 152 x 152, 18 64 Yolo 152 x 152, 18 -> 152 x 152, 18
GPU free memory: 9864.02 mb.
net->print() run overNew NetworkRT (TensorRT v7) Segmentation fault
But when I set FP32, the output is ok. GPU free memory: 9879.75 mb.
net->print() run overNew NetworkRT (TensorRT v7) Float16 support: 1 Int8 support: 1 DLAs: 0 Selected maxBatchSize: 4 GPU free memory: 9773.71 mb. Building tensorRT cuda engine... serialize net create execution context Input/outputs numbers: 2 input index = 0 -> output index = 1 Data dim: 1 3 608 608 1 Data dim: 1 18 152 152 1 RtBuffer 0 dim: Data dim: 1 3 608 608 1 RtBuffer 1 dim: Data dim: 1 18 152 152 1