Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.53k stars 490 forks source link

Custom Training Yolo_Nas Tensorrt #1814

Closed waleedZaghloul closed 7 months ago

waleedZaghloul commented 7 months ago

🐛 Describe the bug

when using notebooks/YoloNAS_Inference_using_TensorRT.ipynb to convert my model to TensorRT everything worked great but when I restarted the kernel and tried to load the generated model it gave me :

The console stream is logged into /home/waleed/sg_logs/console.log [2024-02-04 10:31:19] INFO - crash_tips_setup.py - Crash tips is enabled. You can set your environment variable to CRASH_HANDLER=FALSE to disable it Reading engine from file /home/waleed/Desktop/ALPR_Repo/Yolo_training/YoloNas/server_yolo_nas/ckpt_best.trt [02/04/2024-10:31:21] [TRT] [I] Loaded engine size: 28 MiB [02/04/2024-10:31:21] [TRT] [V] Local registry did not find EfficientNMS_TRT creator. Will try parent registry if enabled. [02/04/2024-10:31:21] [TRT] [V] Global registry did not find EfficientNMS_TRT creator. Will try parent registry if enabled. [02/04/2024-10:31:21] [TRT] [E] 3: getPluginCreator could not find plugin: EfficientNMS_TRT version: 1 [02/04/2024-10:31:21] [TRT] [E] 1: [pluginV2Runner.cpp::load::303] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry) Traceback (most recent call last): File "/home/waleed/Desktop/ALPR_Repo/Yolo_training/YoloNas/server_yolo_nas/nas_trt.py", line 139, in with InferenceSession(engine_file="/home/waleed/Desktop/ALPR_Repo/Yolo_training/YoloNas/server_yolo_nas/ckpt_best.trt", inference_shape=(640, 640)) as session: File "/home/waleed/Desktop/ALPR_Repo/Yolo_training/YoloNas/server_yolo_nas/nas_trt.py", line 71, in enter self.context = self.engine.create_execution_context() AttributeError: 'NoneType' object has no attribute 'create_execution_context'

Versions

--2024-02-04 16:19:11-- https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 22068 (22K) [text/plain] Saving to: ‘collect_env.py’

collect_env.py 100%[===================>] 21.55K --.-KB/s in 0.009s

2024-02-04 16:19:11 (2.22 MB/s) - ‘collect_env.py’ saved [22068/22068]

Collecting environment information... PyTorch version: 2.2.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.35

Python version: 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-6.5.0-15-generic-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Nvidia driver version: 545.23.08 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz CPU family: 6 Model: 158 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Stepping: 10 CPU max MHz: 4600.0000 CPU min MHz: 800.0000 BogoMIPS: 6399.96 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_clear flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 192 KiB (6 instances) L1i cache: 192 KiB (6 instances) L2 cache: 1.5 MiB (6 instances) L3 cache: 12 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-11 Vulnerability Gather data sampling: Mitigation; Microcode Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Retbleed: Mitigation; IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Mitigation; Microcode Vulnerability Tsx async abort: Mitigation; TSX disabled

Versions of relevant libraries: [pip3] numpy==1.23.0 [pip3] onnx==1.13.0 [pip3] onnx-graphsurgeon==0.3.27 [pip3] onnxruntime==1.13.1 [pip3] onnxsim==0.4.35 [pip3] pytorch-quantization==2.1.2 [pip3] torch==2.2.0 [pip3] torchmetrics==0.8.0 [pip3] torchvision==0.17.0 [pip3] triton==2.2.0 [conda] numpy 1.23.0 pypi_0 pypi [conda] pytorch-quantization 2.1.2 pypi_0 pypi [conda] torch 2.2.0 pypi_0 pypi [conda] torchmetrics 0.8.0 pypi_0 pypi [conda] torchvision 0.17.0 pypi_0 pypi [conda] triton 2.2.0 pypi_0 pypi

waleedZaghloul commented 7 months ago

when using https://github.com/thaitc-hust/Yolo-TensorRT/tree/main following this https://www.youtube.com/watch?v=JVCtx7-4qxE loads perfectly

nsabir2011 commented 7 months ago

See #1816. It should work after adding trt.init_libnvinfer_plugins(trt_logger, "").