Open KanikaAdik opened 1 year ago
@KanikaAdik, try replacing DetectionCollateFN
with CrowdDetectionCollateFN
as pointed out in #1194 and let us know if something else comes up.
This seems like a bug. Even if iscrowd is set to 0 (not crowded) for an annotation, it still requires CrowdDetectionCollateFN and throws an error otherwise?
If it is some variable that needs to be changed somewhere to remove crowd. Please do let me know.
I also think it may be best to it to assume non-crowded labels by default, as most labels in COCO and even the ones generated by users for custom datasets usually have iscrowd 0.
Also note that DetectionCollateFN() is the default one used in YoloNAS QAT training tutorial notebook, which throws the above error.
💡 Your Question
I have above trainer to execute and train my roboflow dataset similar to quantization aware fine tuning YoloNAS on custom dataset notebook
But I get error which i suppose is internal to super-gradient package and not due to my input parameters? Same is the error i receive on executing the example for QAT warare fine tuned YoloNAS custom dataset notebook. I need some assitance on what should be the root cause to resolve this issue?
Thanks in Advance!
Versions
--2023-08-13 20:43:23-- https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 21653 (21K) [text/plain] Saving to: ‘collect_env.py.1’
collect_env.py.1 100%[===================>] 21.15K --.-KB/s in 0.001s
2023-08-13 20:43:23 (24.0 MB/s) - ‘collect_env.py.1’ saved [21653/21653]
Collecting environment information... PyTorch version: 1.11.0+cu113 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.2 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 14.0.0-1ubuntu1.1 CMake version: version 3.27.1 Libc version: glibc-2.35
Python version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-5.15.109+-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 525.105.17 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU @ 2.20GHz CPU family: 6 Model: 79 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 Stepping: 0 BogoMIPS: 4399.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities Hypervisor vendor: KVM Virtualization type: full L1d cache: 32 KiB (1 instance) L1i cache: 32 KiB (1 instance) L2 cache: 256 KiB (1 instance) L3 cache: 55 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0,1 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable; SMT Host state unknown Vulnerability Meltdown: Vulnerable Vulnerability Mmio stale data: Vulnerable Vulnerability Retbleed: Vulnerable Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Vulnerable
Versions of relevant libraries: [pip3] numpy==1.23.0 [pip3] pytorch-quantization==2.1.2 [pip3] torch==1.11.0+cu113 [pip3] torchaudio==0.11.0+cu113 [pip3] torchdata==0.6.1 [pip3] torchmetrics==0.8.0 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.15.2 [pip3] torchvision==0.12.0+cu113 [pip3] triton==2.0.0 [conda] Could not collect