Closed marvision-ai closed 4 years ago
Hi @marvision-ai
can you show us the cpp of the test. Probably the problem is just with outputbins to check the correctness of the results.
This is the code for yolov4tiny (with 2 yolo layers)
std::vector<std::string> output_bins = {
bin_path + "/debug/layer30_out.bin",
bin_path + "/debug/layer37_out.bin"
};
For the model you are using three are required.
Hi @mive93 ,
Yes you are correct! That makes sense. Here is the current one I am using.
#include<iostream>
#include<vector>
#include "tkdnn.h"
#include "test.h"
#include "DarknetParser.h"
int main() {
std::string bin_path = "/home/nvidia/ai/tkDNN/build/yolo4tiny-3l-shaft";
std::vector<std::string> input_bins = {
bin_path + "/layers/input.bin"
};
std::vector<std::string> output_bins = {
bin_path + "/debug/layer30_out.bin",
bin_path + "/debug/layer37_out.bin"
};
std::string wgs_path = bin_path + "/layers";
std::string cfg_path = std::string(TKDNN_PATH) + "/tests/darknet/cfg/yolov4-tiny-shaft-3l-rnd.cfg";
std::string name_path = std::string(TKDNN_PATH) + "/tests/darknet/names/shaft.names";
// parse darknet network
tk::dnn::Network *net = tk::dnn::darknetParser(cfg_path, wgs_path, name_path);
net->print();
//convert network to tensorRT
tk::dnn::NetworkRT *netRT = new tk::dnn::NetworkRT(net, net->getNetworkRTName(bin_path.c_str()));
int ret = testInference(input_bins, output_bins, net, netRT);
net->releaseLayers();
delete net;
delete netRT;
return ret;
}
I will update this to include layer 44 and report back my results.
Thank you!
@mive93 I have fixed the issue.
std::vector<std::string> output_bins = {
bin_path + "/debug/layer30_out.bin",
bin_path + "/debug/layer37_out.bin",
bin_path + "/debug/layer44_out.bin"
};
Compiles and runs nicely. Thanks for the heads up.
@marvision-ai I see you are still activate so I hope you don't mind me bumping this. Trying to convert a yolov4-tiny-3l model to be used under tkDNN. Weights exported and moving onto conversion to TensorRT, I run into this issue:
....
GPU free memory: 8502.9 mb.
New NetworkRT (TensorRT v7.23)
Float16 support: 1
Int8 support: 1
DLAs: 0
create execution context
Input/outputs numbers: 3
input index = 0 -> output index = 2
Data dim: 1 3 416 416 1
Data dim: 1 255 26 26 1
RtBuffer 0 dim: Data dim: 1 3 416 416 1
RtBuffer 1 dim: Data dim: 1 255 13 13 1
RtBuffer 2 dim: Data dim: 1 255 26 26 1
====== CUDNN inference ======
Data dim: 1 3 320 320 1
new_coords0
new_coords0
new_coords0
new_coords0
new_coords0
new_coords0
new_coords0
new_coords0
new_coords0
Data dim: 1 18 40 40 1
===== TENSORRT inference ====
Data dim: 1 3 320 320 1
Cuda failure: invalid argument
C:\Users\admin\Source\Repos\tkDNN\src\NetworkRT.cpp:205
I've made some modifications to get to this point, including the one you posted, and I thought this had something to do with batchsize but that doesn't seem to be the case.. any ideas?
can you comment more on your msvc,cuda and nvidia driver versions?
MSVC: 19.28.29336
CUDA: 11.2
NVIDIA: 460.90
ARCH: 86 (Ampere) (3080)
I see the mention of using 465+, so I'll update that now and get back to you.
@perseusdg Edit: No change after upgrading to 466.77
.
I wonder if the network size is being miscalculated, it seems to default to 416x416 - our custom model uses 320*320. Is it possible that the project expects dimensions in specific intervals? After validating each argument, the size is my biggest suspect right now.
I managed to get the TensorRT net built and serialized. Unfortunately now I don't seem to have any detections similar to this issue https://github.com/ceccocats/tkDNN/issues/228
=== OUTPUT 0 CHECK RESULTS ==
CUDNN vs correct
| [ 0 ]: nan 0.425675
| [ 1 ]: nan 0.304462
| [ 2 ]: nan 0.121382
| [ 3 ]: nan 0.0980753
| [ 4 ]: nan 0.113222
| [ 5 ]: nan 0.0964858
| [ 6 ]: nan 0.149539
| [ 7 ]: nan 0.184255
| [ 8 ]: nan 0.117385
| Wrongs: 1800 ~0.02
TRT vs correct
| [ 0 ]: nan 0.425675
| [ 1 ]: nan 0.304462
| [ 2 ]: nan 0.121382
| [ 3 ]: nan 0.0980753
| [ 4 ]: nan 0.113222
| [ 5 ]: nan 0.0964858
| [ 6 ]: nan 0.149539
| [ 7 ]: nan 0.184255
| [ 8 ]: nan 0.117385
| Wrongs: 1800 ~0.02
CUDNN vs TRT
| [ 0 ]: nan nan
| [ 1 ]: nan nan
| [ 2 ]: nan nan
| [ 3 ]: nan nan
| [ 4 ]: nan nan
| [ 5 ]: nan nan
| [ 6 ]: nan nan
| [ 7 ]: nan nan
| [ 8 ]: nan nan
| Wrongs: 1800 ~0.02
=== OUTPUT 1 CHECK RESULTS ==
CUDNN vs correct
| [ 0 ]: nan 0.0156295
| [ 1 ]: nan 0.0647845
| [ 2 ]: nan 0.02962
| [ 3 ]: nan 0.0509262
| [ 4 ]: nan 0.088236
| [ 5 ]: nan 0.0784189
| [ 6 ]: nan 0.0819883
| [ 7 ]: nan 0.0894271
| [ 8 ]: nan 0.0896965
| Wrongs: 7200 ~0.02
TRT vs correct
| [ 0 ]: nan 0.0156295
| [ 1 ]: nan 0.0647845
| [ 2 ]: nan 0.02962
| [ 3 ]: nan 0.0509262
| [ 4 ]: nan 0.088236
| [ 5 ]: nan 0.0784189
| [ 6 ]: nan 0.0819883
| [ 7 ]: nan 0.0894271
| [ 8 ]: nan 0.0896965
| Wrongs: 7200 ~0.02
CUDNN vs TRT
| [ 0 ]: nan nan
| [ 1 ]: nan nan
| [ 2 ]: nan nan
| [ 3 ]: nan nan
| [ 4 ]: nan nan
| [ 5 ]: nan nan
| [ 6 ]: nan nan
| [ 7 ]: nan nan
| [ 8 ]: nan nan
| Wrongs: 7200 ~0.02
=== OUTPUT 2 CHECK RESULTS ==
CUDNN vs correct
| [ 0 ]: nan 0.244062
| [ 1 ]: nan 0.750405
| [ 2 ]: nan 0.826826
| [ 3 ]: nan 0.665474
| [ 4 ]: nan 0.499885
| [ 5 ]: nan 0.402975
| [ 6 ]: nan 0.391236
| [ 7 ]: nan 0.352832
| [ 8 ]: nan 0.354092
| Wrongs: 28800 ~0.02
TRT vs correct
| [ 0 ]: nan 0.244062
| [ 1 ]: nan 0.750405
| [ 2 ]: nan 0.826826
| [ 3 ]: nan 0.665474
| [ 4 ]: nan 0.499885
| [ 5 ]: nan 0.402975
| [ 6 ]: nan 0.391236
| [ 7 ]: nan 0.352832
| [ 8 ]: nan 0.354092
| Wrongs: 28800 ~0.02
CUDNN vs TRT
| [ 0 ]: nan nan
| [ 1 ]: nan nan
| [ 2 ]: nan nan
| [ 3 ]: nan nan
| [ 4 ]: nan nan
| [ 5 ]: nan nan
| [ 6 ]: nan nan
| [ 7 ]: nan nan
| [ 8 ]: nan nan
| Wrongs: 28800 ~0.02
That explains it, unfortunately I'm not sure how to proceed from here.
Additional findings:
The above output is with TKDNN_MODE
equal to FP32
.
With TKDNN_MODE
equal to FP16
the result is a failure to generate TensorRT at https://github.com/ceccocats/tkDNN/blob/c306b368608893e92925bf143e7cf14f19525aeb/src/NetworkRT.cpp#L146.
With TKDNN_MODE equal to INT8
the result is a failure to generate TensorRT at https://github.com/ceccocats/tkDNN/blob/c306b368608893e92925bf143e7cf14f19525aeb/src/Int8BatchStream.cpp#L69.
Additionally, and I'm not quite sure what I did to change this behavior, but I'm not longer receiving zero detections, but rather consistently 6300 detections on a validation image, all of them invalid/null.
I have been testing many versions of yolov4-tiny. Recently, alex released a yolov4-tiny-3l cfg. https://github.com/AlexeyAB/darknet/blob/de68e19cc627f642023f09513ac2306fbcbc1e4b/cfg/yolov4-tiny-3l.cfg
I have to switch the
width= 1120
andheight=960
.I have trained to great accuracy and would like to use this model. When I attempt to export I get the following output:
Any help would be greatly appreciated! Congrats on the great repo!