Getting issue `DetectionCollateFN` only supports Datasets that return a tuple on training yolo_nas_m with coco dataset

KanikaAdik commented 1 year ago

💡 Your Question

from super_gradients.training import Trainer
from super_gradients.common.object_names import Models
from super_gradients.training import models
from super_gradients.training.processing import ComposeProcessing

trainer = Trainer(experiment_name=EXPERIMENT_NAME, ckpt_root_dir=CHECKPOINT_DIR)
net = models.get('yolo_nas_m', num_classes=5, pretrained_weights="coco")
trainer.train(model=net, training_params=train_params, train_loader=train_loader, valid_loader=valid_loader)

I have above trainer to execute and train my roboflow dataset similar to quantization aware fine tuning YoloNAS on custom dataset notebook

But I get error which i suppose is internal to super-gradient package and not due to my input parameters? Same is the error i receive on executing the example for QAT warare fine tuned YoloNAS custom dataset notebook. I need some assitance on what should be the root cause to resolve this issue?

[2023-08-13 20:28:42] INFO - checkpoint_utils.py - License Notification: YOLO-NAS pre-trained weights are subjected to the specific license terms and conditions detailed in 
https://github.com/Deci-AI/super-gradients/blob/master/LICENSE.YOLONAS.md
By downloading the pre-trained weight files you agree to comply with these terms.
[2023-08-13 20:28:42] WARNING - sg_trainer.py - Train dataset size % batch_size != 0 and drop_last=False, this might result in smaller last batch.
The console stream is now moved to /content/checkpoints/testefficientdet_trainer/console_Aug13_20_28_42.txt
[2023-08-13 20:28:42] INFO - sg_trainer.py - Using EMA with params {'decay': 0.9, 'decay_type': 'threshold'}
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/detection_utils.py](https://localhost:8080/#) in __call__(self, data)
    700         try:
--> 701             images_batch, labels_batch = list(zip(*data))
    702         except (ValueError, TypeError):

ValueError: too many values to unpack (expected 2)

During handling of the above exception, another exception occurred:

DatasetItemsException                     Traceback (most recent call last)
5 frames
[/usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/detection_utils.py](https://localhost:8080/#) in __call__(self, data)
    701             images_batch, labels_batch = list(zip(*data))
    702         except (ValueError, TypeError):
--> 703             raise DatasetItemsException(data_sample=data[0], collate_type=type(self), expected_item_names=self.expected_item_names)
    704 
    705         return self._format_images(images_batch), self._format_targets(labels_batch)

DatasetItemsException: `DetectionCollateFN` only supports Datasets that return a tuple ('image', 'targets'), but got a tuple of len=3

Thanks in Advance!

Versions

--2023-08-13 20:43:23-- https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 21653 (21K) [text/plain] Saving to: ‘collect_env.py.1’

collect_env.py.1 100%[===================>] 21.15K --.-KB/s in 0.001s

2023-08-13 20:43:23 (24.0 MB/s) - ‘collect_env.py.1’ saved [21653/21653]

Collecting environment information... PyTorch version: 1.11.0+cu113 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 14.0.0-1ubuntu1.1 CMake version: version 3.27.1 Libc version: glibc-2.35

Python version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-5.15.109+-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 525.105.17 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU @ 2.20GHz CPU family: 6 Model: 79 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 Stepping: 0 BogoMIPS: 4399.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities Hypervisor vendor: KVM Virtualization type: full L1d cache: 32 KiB (1 instance) L1i cache: 32 KiB (1 instance) L2 cache: 256 KiB (1 instance) L3 cache: 55 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0,1 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable; SMT Host state unknown Vulnerability Meltdown: Vulnerable Vulnerability Mmio stale data: Vulnerable Vulnerability Retbleed: Vulnerable Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Vulnerable

Versions of relevant libraries: [pip3] numpy==1.23.0 [pip3] pytorch-quantization==2.1.2 [pip3] torch==1.11.0+cu113 [pip3] torchaudio==0.11.0+cu113 [pip3] torchdata==0.6.1 [pip3] torchmetrics==0.8.0 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.15.2 [pip3] torchvision==0.12.0+cu113 [pip3] triton==2.0.0 [conda] Could not collect

bit-scientist commented 1 year ago

@KanikaAdik, try replacing DetectionCollateFN with CrowdDetectionCollateFN as pointed out in #1194 and let us know if something else comes up.

siddagra commented 12 months ago

This seems like a bug. Even if iscrowd is set to 0 (not crowded) for an annotation, it still requires CrowdDetectionCollateFN and throws an error otherwise?

If it is some variable that needs to be changed somewhere to remove crowd. Please do let me know.

I also think it may be best to it to assume non-crowded labels by default, as most labels in COCO and even the ones generated by users for custom datasets usually have iscrowd 0.

Also note that DetectionCollateFN() is the default one used in YoloNAS QAT training tutorial notebook, which throws the above error.

Deci-AI / super-gradients

Getting issue `DetectionCollateFN` only supports Datasets that return a tuple on training yolo_nas_m with coco dataset #1370

💡 Your Question

Versions