AlexeyAB / yolo2_light

Light version of convolutional neural network Yolo v3 & v2 for objects detection with a minimum of dependencies (INT8-inference, BIT1-XNOR-inference)
MIT License
301 stars 116 forks source link

Error int8 support for jetson tx2 #26

Open engineer1109 opened 5 years ago

engineer1109 commented 5 years ago

Error: CUDNN_STATUS_ARCH_MISMATCH - This GPU doesn't support DP4A (INT8 weights and input)

cudnnstat = 6

So jetson tx2 not support for quantized?

AlexeyAB commented 5 years ago

It seems yes, jetson tx2 doesn't support INT8-quantization (DP4A). This is strange, because jetson tx2 is Pascal architecture and compute capability (CC) = 6.2 that is higher than 6.0: https://en.wikipedia.org/wiki/CUDA#GPUs_supported

What CUDA, cuDNN do you use? Can you show output of command nvidia-smi?

engineer1109 commented 5 years ago

nvidia-smi don't support jetson tx2? cuda 9.0 cudnn 7.1.5 Is there any way to make jetson tx2 support for fp16?

AlexeyAB commented 5 years ago

nvidia-smi don't support jetson tx2?

Any desktop GPU supports nvidia-smi.

Is there any way to make jetson tx2 support for fp16?

jetson tx2 supports fp16, but it doesn't have Tensor Cores, so fp16 will not be faster than fp32 on jetson tx2.

engineer1109 commented 5 years ago

jetson tx2 really not support nvidia-smi it is not a dependent gpu. Its gpu memory shares with cpu memory.

AlexeyAB commented 5 years ago

oh yeah nvidia-smi doesn't work on tegra (jetson tx2)

so I think it doesn't support DP4A (INT8).

You can only try to use XNOR (1-bit) quantization by training these models:

lynnw123 commented 5 years ago

I met the same issue, it can compile successfully with GPU, but when do the INT8 inference, it complains about the mismatch. I also use cuda 9.0 and cudnn 7.1.5 on TX2. In TX2, I use "sudo ~/tegrastats" to monitor the GPU usage since nvidia-smi is not working. error

AlexeyAB commented 5 years ago

@Yinling-123 TX2 doesn't support INT8 optimizations.

aniketvartak commented 5 years ago

oh yeah nvidia-smi doesn't work on tegra (jetson tx2)

so I think it doesn't support DP4A (INT8).

You can only try to use XNOR (1-bit) quantization by training these models:

@AlexeyAB will you share models trained with XNOR quantization with us?