Open uelordi01 opened 1 year ago
Hi,
The FPS is calculated as # of images / total time as your first approach.
The yolov4.cfg file that you mentioned is not the configuration file we used for the final result. Please check Table 5 in our paper.
Also, we used the MAXN power mode with the fixed frequency by the _jetsonclocks command. In addition, you need to explore technique parameters on your board with Jetpack 4.6. And since your Xavier board's Jetpack version is different, it may be difficult to reproduce the same result. As I remember, in Jetpack 4.6, the result was slow.
The below video shows the result of our experiment (I have run it just before). https://github.com/cap-lab/jedi/assets/20039661/3eb58a84-80e1-4aa1-9c0c-b07a51d6bf8f The result shows about 40 seconds for inference time, which indicates about 124 FPS (4952/40).
Thanks
Ok, understood. Thank you for the clarification :) : So I changed the configuration based on the table 5 and the PND-A (2pipeline stages (PEs (2dla,GPU)) (in the paper). And I get more or less 80 FPS. The idea was not to reproduce exactly the experiments, I just wanted to be sure that my FPS calculation was fine and the configuration files that I was creating were the correct ones. Here the configuration that I used following the paper table 5 results (FOR YOLOV4). If I am missing something please tell me. Otherwise you can close the issue.
configs = { instance_num = "1" instances = ( { network_name="yolo4";
model_dir = "/media/jetson/SD/uelordi_experiments/jedi/data/bin/yolo4";
bin_path = "/media/jetson/SD/uelordi_experiments/jedi/data/bin/yolo4";
cfg_path = "/media/jetson/SD/uelordi_experiments/jedi/data/cfg/yolo4_relu.cfg";
image_path = "/media/jetson/SD/uelordi_experiments/jedi/paper_experiments/experiment_images.txt"
calib_image_path = "/media/jetson/SD/uelordi_experiments/jedi/data/all_images.txt";
calib_images_num = "100";
calib_table = "/sdcard/chjej202/models2/yolov4/model416x416_0.268_DLA_INT8_1-calibration.table";
name_path = "/media/jetson/SD/uelordi_experiments/jedi/data/coco.names";
batch = "1"
offset = "0";
sample_size = "4952"
device_num = "2"
pre_thread_num = "1"
post_thread_num = "1"
buffer_num = "5"
cut_points = "82,268"
streams = "4,2"
devices = "DLA,GPU"
dla_cores = "0,1"
data_type = "FP16"
}
)
}
Hi,
Can you change _dlacores = "0,1" to _dlacores = "2,1"? dla_cores = "0,1" means that the DLA 0 is used for the first stage. To use the PND technique, dla_cores is needed to be changed like _dlacores = "2,1". If the value is greater than or equal to the number of cores (the number of DLAs), then the PND technique is applied.
Thanks
Hi @cap-lab : Hi I am trying to reproduce the FPS results of the table (FP16). I took yolov4-cfg with FP16 weight precission as an example . My question is about how this FPS values are calculated. I compiled the code using [tensorrt8_support branch] (https://github.com/cap-lab/jedi/tree/tensorrt8_support) and your modified tkdnn with tensorrt8_experiment My jetson AGX configuration is jetpack 4.6.2 with tensorrt 8.2.1.8
The output of the jedi/build/bin/proc gives me the followint output.
Digging in the code i see that 209.684 inference time are the timestamp from start and end time in program execution (proc)
The other value is average latency calculated in microseconds according to _getAverageLatency(iter, &configdata, latencies[iter])
So, I guessed that the FPS values could be calculated using inference time. 1/(209.684/(4952 images)) -> 23 FPS
The other approach for FPS I though is to make 1/(average latency (milliseconds) which is 79.68 ms -> so FPS -> 1/0.07968 -> 12.55 FPS
There are 128FPS for YoloV4 approach in the table and my FPS calculation approaches are (23FPS and 12.55FPS), so they are far from where I should be expected.
For this reason, could you give some hint of how to calculate the FPSs to reproduce the results? Thank you in advance. Unai.