[x] Implementing the TensorRT engine inference using NVIDIA examples and other tutorials in C++
[ ] Optimizing the inference with batching, CUDA contexts, multithreading/asynchronous streams
[ ] Include best practices: NVIDIA TensorRT optimizations (include creating interfaces or abstract classes that define the common functionality required for neural network processing)
[ ] Comparison w/ industry examples (Nvidia, Zed, Github examples) to adapt our case to the Jetson hardware
[ ] Metric logging for benchmarking using Nvidia tutorials
[ ] Test different batch sizes to maximize the GPU utlilization
[ ] Investigate any other potential areas for multiprocessing or parallel execution to speed up task processing
Links Checklist:
Links Checklist (some of these may overlap w/ Jerome's so maybe meet with him to discuss progress and what needs to be revisited or reviewed)