Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.59k stars 509 forks source link

TypeError: expected string or bytes-like object #1819

Open mskv99 opened 9 months ago

mskv99 commented 9 months ago

💡 Your Question

When I tried to train model on a custom dataset with pretained weights I came across an error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-36-63d1d2b55e46>](https://8n6lgpl7uz-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240205-060127_RC00_604283622#) in <cell line: 1>()
----> 1 trainer.train(model = model_small,training_params = train_params,train_loader = train_data,valid_loader = val_data)

6 frames
[/usr/lib/python3.10/re.py](https://8n6lgpl7uz-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240205-060127_RC00_604283622#) in sub(pattern, repl, string, count, flags)
    207     a callable, it's passed the Match object and must return
    208     a replacement string to be used."""
--> 209     return _compile(pattern, flags).sub(repl, string, count)
    210 
    211 def subn(pattern, repl, string, count=0, flags=0):

TypeError: expected string or bytes-like object

In my previous runs with practically the same configuration of training parameters everything was ok. Couldn't find out where is the problem:(

Just in case I will provide a block of code from my Collab Notebook:

dataset_params = {
    'data_dir':ROOT_DIR,
    'train_images_dir':train_imgs_dir,
    'train_labels_dir':train_labels_dir,
    'val_images_dir':val_imgs_dir,
    'val_labels_dir':val_labels_dir,
    'test_images_dir':test_imgs_dir,
    'test_labels_dir':test_labels_dir,
    'classes':classes
}

train_data = coco_detection_yolo_format_train(
    dataset_params={
        'data_dir': dataset_params['data_dir'],
        'images_dir': dataset_params['train_images_dir'],
        'labels_dir': dataset_params['train_labels_dir'],
        'classes': dataset_params['classes']
    },
    dataloader_params={
        'batch_size':BATCH_SIZE,
        'num_workers':WORKERS
    }
)

val_data = coco_detection_yolo_format_val(
    dataset_params={
        'data_dir': dataset_params['data_dir'],
        'images_dir': dataset_params['val_images_dir'],
        'labels_dir': dataset_params['val_labels_dir'],
        'classes': dataset_params['classes']
    },
    dataloader_params={
        'batch_size':BATCH_SIZE,
        'num_workers':WORKERS
    }
)

test_data = coco_detection_yolo_format_val(
    dataset_params={
        'data_dir': dataset_params['data_dir'],
        'images_dir': dataset_params['test_images_dir'],
        'labels_dir': dataset_params['test_labels_dir'],
        'classes': dataset_params['classes']
    },
    dataloader_params={
        'batch_size': BATCH_SIZE,
        'num_workers': WORKERS
    }
)

train_params = {
    'silent_mode': False,
    "average_best_models":True,
    "warmup_mode": "linear_epoch_step",
    "warmup_initial_lr": 1e-6,
    "lr_warmup_epochs": 3,
    "initial_lr": 5e-4,
    "lr_mode": "cosine",
    "cosine_final_lr_ratio": 0.1,
    "optimizer": "SGD",
    "optimizer_params": {"weight_decay": 0.0005, "momentum": 0.937},
    "zero_weight_decay_on_bias_and_bn": True,
    "ema": False,
    "max_epochs": EPOCHS,
    "mixed_precision": True,
    "loss": PPYoloELoss(
        use_static_assigner=False,
        num_classes=len(dataset_params['classes']),
        reg_max=16,
    ),
    "seed": 42,
    "train_metrics_list": [
        DetectionMetrics_050(
            score_thres=0.1,
            top_k_predictions=300,
            num_cls=len(dataset_params['classes']),
            normalize_targets=True,
            post_prediction_callback=PPYoloEPostPredictionCallback(
                score_threshold=0.01,
                nms_top_k=1000,
                max_predictions=300,
                nms_threshold=0.7
            )
        ),
        DetectionMetrics_050_095(
            score_thres=0.1,
            top_k_predictions=300,
            num_cls=len(dataset_params['classes']),
            normalize_targets=True,
            post_prediction_callback=PPYoloEPostPredictionCallback(
                score_threshold=0.01,
                nms_top_k=1000,
                max_predictions=300,
                nms_threshold=0.7
            )
        )
    ],

    "valid_metrics_list": [
        DetectionMetrics_050(
            score_thres=0.1,
            top_k_predictions=300,
            num_cls=len(dataset_params['classes']),
            normalize_targets=True,
            post_prediction_callback=PPYoloEPostPredictionCallback(
                score_threshold=0.01,
                nms_top_k=1000,
                max_predictions=300,
                nms_threshold=0.7
            )
        ),
        DetectionMetrics_050_095(
            score_thres=0.1,
            top_k_predictions=300,
            num_cls=len(dataset_params['classes']),
            normalize_targets=True,
            post_prediction_callback=PPYoloEPostPredictionCallback(
                score_threshold=0.01,
                nms_top_k=1000,
                max_predictions=300,
                nms_threshold=0.7
            )
        )
    ],
    "metric_to_watch": ['mAP@0.50:0.95', 'mAP@0.50'],

    #define weights and biases logging
    "sg_logger": 'wandb_sg_logger',
    "sg_logger_params":
    {
        "project_name": "Yolo-NAS",
        "save_checkpoints_remote": True,
        "save_tensorboard_remote":True,
        'save_logs_remote':True,
        'entity':'ml_team_mskv'

    }
}

model_small = models.get('yolo_nas_s',
                   num_classes=len(dataset_params['classes']),
                   pretrained_weights="coco"
                   )

trainer.train(model = model_small,training_params = train_params,train_loader = train_data,valid_loader = val_data)

Versions

Collecting environment information... PyTorch version: 2.1.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 14.0.0-1ubuntu1.1 CMake version: version 3.27.9 Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-6.1.58+-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 12.2.140 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 535.104.05 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.6 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU @ 2.00GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 Stepping: 3 BogoMIPS: 4000.37 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities Hypervisor vendor: KVM Virtualization type: full L1d cache: 32 KiB (1 instance) L1i cache: 32 KiB (1 instance) L2 cache: 1 MiB (1 instance) L3 cache: 38.5 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0,1 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable; SMT Host state unknown Vulnerability Meltdown: Vulnerable Vulnerability Mmio stale data: Vulnerable Vulnerability Retbleed: Vulnerable Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Vulnerable

Versions of relevant libraries: [pip3] numpy==1.23.0 [pip3] onnx==1.13.0 [pip3] onnx-simplifier==0.4.35 [pip3] onnxruntime==1.13.1 [pip3] torch==2.1.0+cu121 [pip3] torchaudio==2.1.0+cu121 [pip3] torchdata==0.7.0 [pip3] torchmetrics==0.8.0 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.16.0 [pip3] torchvision==0.16.0+cu121 [pip3] triton==2.1.0

BloodAxe commented 9 months ago

Please provide a full stacktrace, it's impossible to understand what part of the input arguments causing this crash

mskv99 commented 9 months ago

@BloodAxe , probably it it easier to just provide a link to a notebook itself.