lovehuanhuan commented 3 years ago

My system is win10 nvidiadocker rtx2080ti when I set "export TKDNN_MODE=FP16" then there is an Segmentation fault, but FP32 is ok. I have layers and debug dirs, I have only one yolo layer.

Not supported field: batch=4 Not supported field: subdivisions=6 Not supported field: momentum=0.9 Not supported field: decay=0.0005 Not supported field: angle=0 Not supported field: saturation = 1.5 Not supported field: exposure = 1.5 Not supported field: hue=.1 Not supported field: letter_box=1 Not supported field: learning_rate=0.001 Not supported field: burn_in=1000 Not supported field: max_batches = 50020 Not supported field: policy=steps Not supported field: steps=10000,20000 Not supported field: scales=.1,.1 New NETWORK (tkDNN v0.5, CUDNN v7.605) !! FP16 INFERENCE ENABLED !! Reading weights: I=3 O=32 KERNEL=3x3x1 Reading weights: I=32 O=64 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=3x3x1 Reading weights: I=32 O=32 KERNEL=3x3x1 Reading weights: I=32 O=32 KERNEL=3x3x1 Reading weights: I=32 O=32 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=512 O=512 KERNEL=3x3x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=256 O=128 KERNEL=1x1x1 Reading weights: I=384 O=256 KERNEL=3x3x1 Reading weights: I=256 O=64 KERNEL=1x1x1 Reading weights: I=192 O=128 KERNEL=3x3x1 Reading weights: I=256 O=18 KERNEL=1x1x1 Not supported field: anchors =5,7, 12,14, 19, 25 Not supported field: jitter=.3 Not supported field: cls_normalizer=1.0 Not supported field: iou_normalizer=0.07 Not supported field: iou_loss=ciou Not supported field: ignore_thresh = .7 Not supported field: truth_thresh = 1 Not supported field: random=1

====================== NETWORK MODEL ====================== N. Layer type input (HW,CH) output (HW,CH) 0 Conv2d 608 x 608, 3 -> 304 x 304, 32 1 ActivationLeaky 304 x 304, 32 -> 304 x 304, 32 2 Conv2d 304 x 304, 32 -> 152 x 152, 64 3 ActivationLeaky 152 x 152, 64 -> 152 x 152, 64 4 Conv2d 152 x 152, 64 -> 152 x 152, 64 5 ActivationLeaky 152 x 152, 64 -> 152 x 152, 64 6 Route 152 x 152, 32 -> 152 x 152, 32 7 Conv2d 152 x 152, 32 -> 152 x 152, 32 8 ActivationLeaky 152 x 152, 32 -> 152 x 152, 32 9 Conv2d 152 x 152, 32 -> 152 x 152, 32 10 ActivationLeaky 152 x 152, 32 -> 152 x 152, 32 11 Conv2d 152 x 152, 32 -> 152 x 152, 32 12 ActivationLeaky 152 x 152, 32 -> 152 x 152, 32 13 Route 152 x 152, 64 -> 152 x 152, 64 14 Conv2d 152 x 152, 64 -> 152 x 152, 64 15 ActivationLeaky 152 x 152, 64 -> 152 x 152, 64 16 Route 152 x 152, 128 -> 152 x 152, 128 17 Pooling 152 x 152, 128 -> 76 x 76, 128 18 Conv2d 76 x 76, 128 -> 76 x 76, 128 19 ActivationLeaky 76 x 76, 128 -> 76 x 76, 128 20 Route 76 x 76, 64 -> 76 x 76, 64 21 Conv2d 76 x 76, 64 -> 76 x 76, 64 22 ActivationLeaky 76 x 76, 64 -> 76 x 76, 64 23 Conv2d 76 x 76, 64 -> 76 x 76, 64 24 ActivationLeaky 76 x 76, 64 -> 76 x 76, 64 25 Route 76 x 76, 128 -> 76 x 76, 128 26 Conv2d 76 x 76, 128 -> 76 x 76, 128 27 ActivationLeaky 76 x 76, 128 -> 76 x 76, 128 28 Route 76 x 76, 256 -> 76 x 76, 256 29 Pooling 76 x 76, 256 -> 38 x 38, 256 30 Conv2d 38 x 38, 256 -> 38 x 38, 256 31 ActivationLeaky 38 x 38, 256 -> 38 x 38, 256 32 Route 38 x 38, 128 -> 38 x 38, 128 33 Conv2d 38 x 38, 128 -> 38 x 38, 128 34 ActivationLeaky 38 x 38, 128 -> 38 x 38, 128 35 Conv2d 38 x 38, 128 -> 38 x 38, 128 36 ActivationLeaky 38 x 38, 128 -> 38 x 38, 128 37 Conv2d 38 x 38, 128 -> 38 x 38, 128 38 ActivationLeaky 38 x 38, 128 -> 38 x 38, 128 39 Route 38 x 38, 256 -> 38 x 38, 256 40 Conv2d 38 x 38, 256 -> 38 x 38, 256 41 ActivationLeaky 38 x 38, 256 -> 38 x 38, 256 42 Route 38 x 38, 512 -> 38 x 38, 512 43 Pooling 38 x 38, 512 -> 19 x 19, 512 44 Conv2d 19 x 19, 512 -> 19 x 19, 512 45 ActivationLeaky 19 x 19, 512 -> 19 x 19, 512 46 Conv2d 19 x 19, 512 -> 19 x 19, 256 47 ActivationLeaky 19 x 19, 256 -> 19 x 19, 256 48 Route 19 x 19, 256 -> 19 x 19, 256 49 Conv2d 19 x 19, 256 -> 19 x 19, 128 50 ActivationLeaky 19 x 19, 128 -> 19 x 19, 128 51 Upsample 19 x 19, 128 -> 38 x 38, 128 52 Route 38 x 38, 384 -> 38 x 38, 384 53 Conv2d 38 x 38, 384 -> 38 x 38, 256 54 ActivationLeaky 38 x 38, 256 -> 38 x 38, 256 55 Conv2d 38 x 38, 256 -> 38 x 38, 64 56 ActivationLeaky 38 x 38, 64 -> 38 x 38, 64 57 Upsample 38 x 38, 64 -> 76 x 76, 64 58 Route 76 x 76, 192 -> 76 x 76, 192 59 Upsample 76 x 76, 192 -> 152 x 152, 192 60 Conv2d 152 x 152, 192 -> 152 x 152, 128 61 ActivationLeaky 152 x 152, 128 -> 152 x 152, 128 62 Route 152 x 152, 256 -> 152 x 152, 256 63 Conv2d 152 x 152, 256 -> 152 x 152, 18 64 Yolo 152 x 152, 18 -> 152 x 152, 18

GPU free memory: 9864.02 mb.

net->print() run overNew NetworkRT (TensorRT v7) Segmentation fault

But when I set FP32, the output is ok. GPU free memory: 9879.75 mb.

net->print() run overNew NetworkRT (TensorRT v7) Float16 support: 1 Int8 support: 1 DLAs: 0 Selected maxBatchSize: 4 GPU free memory: 9773.71 mb. Building tensorRT cuda engine... serialize net create execution context Input/outputs numbers: 2 input index = 0 -> output index = 1 Data dim: 1 3 608 608 1 Data dim: 1 18 152 152 1 RtBuffer 0 dim: Data dim: 1 3 608 608 1 RtBuffer 1 dim: Data dim: 1 18 152 152 1

lovehuanhuan commented 3 years ago

When I use your demo " ./test_yolo4tiny", I get Segmentation fault also after "export TKDNN_MODE=FP16"

mive93 commented 3 years ago

Currently, we do not support Windows, that is why no where is explained how to solve all the issues with that OS.

perseusdg commented 3 years ago

tkDNN is now supported on windows on the current master branch at an experimental level

mive93 commented 2 years ago

Closing for inactivity. Feel free to reopen.

ceccocats / tkDNN

Segmentation fault #174