MasterBin-IIAU / Unicorn

[ECCV'22 Oral] Towards Grand Unification of Object Tracking
MIT License
953 stars 87 forks source link

ValueError: Invalid num_classes #13

Closed AhmedKhaled945 closed 2 years ago

AhmedKhaled945 commented 2 years ago

Hello @MasterBin-IIAU, when i use exp file unicorn_track_large.py, along with the weights unicorn_det_convnext_large_800x1280, model loads well, as long as num of classes is 8 or 80, but when i try to accommodate this to my dataset (num_classes = 11), to retrain on my custom database, the model raises the error Invalid num_classes, Is this the expected behaviour? i want to train a detector that can be used in qdtrack association (track_omni.py script), trained on my dataset (11 classes), is that possible with the current scripts? Thanks in advance.

MasterBin-IIAU commented 2 years ago

@AhmedKhaled945 Hi, this exception is used to remind users to handle the mapping between class names of your own dataset and COCO. For example, BDD100K has 8 classes (pedestrian, rider, car, truck, bus, train, motorcycle, bicycle), their corresponding index in COCO classes are [0,0,2,7,5,6,3,1] (as shown in here).

AhmedKhaled945 commented 2 years ago

okay got it, so i should map each class of mine to the nearest class corresponding to it in COCO, Okay thanks.

AhmedKhaled945 commented 2 years ago

2022-07-25 11:26:59 | INFO | unicorn.core.trainer:378 - ---> start train epoch1 /usr/local/lib/python3.7/dist-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 2022-07-25 11:27:02 | ERROR | unicorn.core.launch:98 - An error has been caught in function 'launch', process 'MainProcess' (2320), thread 'MainThread' (140427802044288): Traceback (most recent call last):

File "tools/train.py", line 132, in args=(exp, args), │ └ Namespace(batch_size=4, cache=False, ckpt=None, devices=1, dist_backend='nccl', dist_url=None, exp_file='/content/Unicorn/exp... └ ╒═══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...

File "/content/Unicorn/unicorn/core/launch.py", line 98, in launch main_func(*args) │ └ (╒═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════... └ <function main at 0x7fb7e2a048c0>

File "tools/train.py", line 110, in main trainer.train() │ └ <function Trainer.train at 0x7fb743e40e60> └ <unicorn.core.trainer.Trainer object at 0x7fb7329fd710>

File "/content/Unicorn/unicorn/core/trainer.py", line 94, in train self.train_in_epoch() │ └ <function Trainer.train_in_epoch at 0x7fb732b26c20> └ <unicorn.core.trainer.Trainer object at 0x7fb7329fd710>

File "/content/Unicorn/unicorn/core/trainer.py", line 149, in train_in_epoch self.train_in_iter() │ └ <function Trainer.train_in_iter at 0x7fb732b26cb0> └ <unicorn.core.trainer.Trainer object at 0x7fb7329fd710>

File "/content/Unicorn/unicorn/core/trainer.py", line 155, in train_in_iter self.train_one_iter() │ └ <function Trainer.train_one_iter at 0x7fb732b277a0> └ <unicorn.core.trainer.Trainer object at 0x7fb7329fd710>

File "/content/Unicorn/unicorn/core/trainer.py", line 194, in train_one_iter outputs = self.model(inps, targets) │ │ │ └ tensor([[[ 0.0000, 456.5000, 201.3750, 29.3594, 76.5000], │ │ │ [ 0.0000, 371.7500, 225.0000, 34.8750, 83.... │ │ └ tensor([[[[124., 124., 124., ..., 53., 53., 53.], │ │ [124., 124., 124., ..., 53., 53., 53.], │ │ [124., ... │ └ Unicorn( │ (backbone): YOLOPAFPNNEW( │ (backbone): ConvNeXt( │ (downsample_layers): ModuleList( │ (0): Sequential... └ <unicorn.core.trainer.Trainer object at 0x7fb7329fd710>

File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) │ │ └ {} │ └ (tensor([[[[124., 124., 124., ..., 53., 53., 53.], │ [124., 124., 124., ..., 53., 53., 53.], │ [124.,... └ <bound method Unicorn.forward of Unicorn( (backbone): YOLOPAFPNNEW( (backbone): ConvNeXt( (downsample_layers): Mo...

File "/content/Unicorn/unicorn/models/unicorn.py", line 139, in forward return self.head(fpn_outs, pred_lbs1_ms, mode="mot"), seq_dict │ │ │ └ {'feat': tensor([[[[-3.5236e-03, 6.1762e-05, 2.0314e-02, ..., 2.3803e-02, │ │ │ 2.2658e-02, 2.5742e-02], │ │ │ ... │ │ └ (tensor([[[[0., 0., 0., ..., 0., 0., 0.], │ │ [0., 0., 0., ..., 0., 0., 0.], │ │ [0., 0., 0., ..., 0., 0., 0.]... │ └ (tensor([[[[-1.2437e-01, -1.0452e-01, -1.0893e-01, ..., -6.0160e-02, │ 4.3569e-02, -1.2673e-01], │ [-1.184... └ Unicorn( (backbone): YOLOPAFPNNEW( (backbone): ConvNeXt( (downsample_layers): ModuleList( (0): Sequential...

File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) │ │ └ {'mode': 'mot'} │ └ ((tensor([[[[-1.2437e-01, -1.0452e-01, -1.0893e-01, ..., -6.0160e-02, │ 4.3569e-02, -1.2673e-01], │ [-1.18... └ <bound method UnicornHead.forward of UnicornHead( (cls_convs): ModuleList( (0): Sequential( (0): BaseConv( ...

File "/content/Unicorn/unicorn/models/unicorn_head.py", line 361, in forward return_ota=return_ota └ False

File "/content/Unicorn/unicorn/models/unicorn_head.py", line 503, in get_losses nlabel = (labels.sum(dim=2) > 0).sum(dim=1) # number of objects └ None

AttributeError: 'NoneType' object has no attribute 'sum'

I got everything ready same yolox fashion, dataloaders and eval loaders, dataset in coco format, exp file working perfect, but once the training starts, i get this error (my same procedure on yolox repo works fine), Can you help me with this? @MasterBin-IIAU

MasterBin-IIAU commented 2 years ago

Hi, if your dataset consists of videos (like MOT17) rather than static images (like Crowdhuman, ETHZ, etc), you may need to further run this script for data preparation of Unicorn. This script will generate a new json file called train_omni.json. Then we use this json file to build dataset as shown in here https://github.com/MasterBin-IIAU/Unicorn/blob/c60fceb2181e12ea487900d3b0991d46eae54d43/unicorn/exp/unicorn_track.py#L318

AhmedKhaled945 commented 2 years ago

My dataset is images, in COCO format, has the following folders Annotations train2017 val2017

AhmedKhaled945 commented 2 years ago

My Goal is to train on the dataset i made for YoloX, but instead of training a detector, i want to make my model able to produce embeddings for QDtrack association, so is there any processing on the data needed beyond being in COCO format? @MasterBin-IIAU

MasterBin-IIAU commented 2 years ago

@AhmedKhaled945 Hi, there should be no further processing needed beyond being in COCO format. Could you please send me the complete train_log.txt? I would try to find the problem.

AhmedKhaled945 commented 2 years ago

2022-07-25 11:26:43 | INFO | unicorn.core.trainer:293 - args: Namespace(batch_size=4, cache=False, ckpt=None, devices=1, dist_backend='nccl', dist_url=None, exp_file='/content/Unicorn/exps/default/unicorn_track_large_mot_challenge.py', experiment_name='unicorn_track_large_mot_challenge', fp16=True, machine_rank=0, name=None, num_machines=1, occupy=True, opts=[], resume=False, start_epoch=None) 2022-07-25 11:26:43 | INFO | unicorn.core.trainer:294 - exp value: ╒═══════════════════╤═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕ │ keys │ values │ ╞═══════════════════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ │ seed │ None │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ output_dir │ './Unicorn_outputs' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ print_interval │ 15 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ eval_interval │ 10 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ task │ 'det' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ exp_name │ 'unicorn_track_large_mot_challenge' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ num_classes │ 10 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ depth │ 1.0 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ width │ 1.0 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ act │ 'silu' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ use_gn │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ backbone_name │ 'convnext_large' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ in_channels │ [384, 768, 1536] │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ embed_dim │ 128 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ interact_mode │ 'deform' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ use_attention │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ n_layer_att │ 3 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ unshared_obj │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ unshared_reg │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ fuse_method │ 'sum' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ learnable_fuse │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ data_num_workers │ 0 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ input_size │ (800, 1280) │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ multiscale_range │ 2 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ data_dir │ '/content/drive/MyDrive/yolox_putney_training/home/ec2-user/SageMaker/Ahmed_Yolox_Trials/coco_formated_putney_extended' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ train_ann │ 'train.json' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ train_name │ 'train2017' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ val_ann │ 'validation.json' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ val_name │ 'val2017' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ mosaic_prob │ -1.0 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ mixup_prob │ 1.0 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ hsv_prob │ 1.0 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ flip_prob │ 0.5 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ degrees │ 10.0 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ translate │ 0.1 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ mosaic_scale │ (0.1, 2) │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ mixup_scale │ (0.5, 1.5) │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ shear │ 2.0 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ perspective │ 0.0 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ enable_mixup │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ normalize │ False │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ warmup_epochs │ 1 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ max_epoch │ 15 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ warmup_lr │ 0 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ basic_lr_per_img │ 7.8125e-06 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ scheduler │ 'yoloxwarmcos' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ no_aug_epochs │ 3 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ min_lr_ratio │ 0.1 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ ema │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ mhs │ False │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ weight_decay │ 0.0005 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ debug_only │ False │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ samples_per_epoch │ 200000 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ sync_bn │ False │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ always_l1 │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ use_grad_acc │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ grad_acc_step │ 2 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ grid_sample │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ bidirect │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ train_mode │ 'alter' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ alter_step │ 1 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ mot_weight │ 3 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ scale_all_mot │ True │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ pretrain_name │ 'unicorn_det_convnext_large_800x1280' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ test_size │ (800, 1280) │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ test_conf │ 0.01 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ nmsthre │ 0.65 │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ test_ann │ 'test.json' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ test_name │ 'test' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ test_data_dir │ '/content/Unicorn/datasets/mot' │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ sot_only │ False │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ mot_only │ False │ ├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ mot_test_name │ 'motchallenge' │ ╘═══════════════════╧═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛ Loading COCO pretrained weights from /content/Unicorn/datasets/../Unicorn_outputs/unicorn_det_convnext_large_800x1280/best_ckpt.pth missing keys: ['head.beta_0', 'head.beta_1', 'head.beta_2', 'head.cls_preds_sot.0.weight', 'head.cls_preds_sot.0.bias', 'head.cls_preds_sot.1.weight', 'head.cls_preds_sot.1.bias', 'head.cls_preds_sot.2.weight', 'head.cls_preds_sot.2.bias', 'bottleneck.0.weight', 'bottleneck.0.bias', 'bottleneck.1.weight', 'bottleneck.1.bias', 'upsample_layer.1.weight', 'upsample_layer.1.bias', 'upsample_layer.3.weight', 'upsample_layer.3.bias', 'pos_emb.row_embed.weight', 'pos_emb.col_embed.weight', 'transformer.level_embed', 'transformer.encoder.layers.0.self_attn.sampling_offsets.weight', 'transformer.encoder.layers.0.self_attn.sampling_offsets.bias', 'transformer.encoder.layers.0.self_attn.attention_weights.weight', 'transformer.encoder.layers.0.self_attn.attention_weights.bias', 'transformer.encoder.layers.0.self_attn.value_proj.weight', 'transformer.encoder.layers.0.self_attn.value_proj.bias', 'transformer.encoder.layers.0.self_attn.output_proj.weight', 'transformer.encoder.layers.0.self_attn.output_proj.bias', 'transformer.encoder.layers.0.norm1.weight', 'transformer.encoder.layers.0.norm1.bias', 'transformer.encoder.layers.0.linear1.weight', 'transformer.encoder.layers.0.linear1.bias', 'transformer.encoder.layers.0.linear2.weight', 'transformer.encoder.layers.0.linear2.bias', 'transformer.encoder.layers.0.norm2.weight', 'transformer.encoder.layers.0.norm2.bias'] unexpected keys: [] 2022-07-25 11:26:51 | INFO | unicorn.data.datasets.coco:46 - loading annotations into memory... 2022-07-25 11:26:51 | INFO | unicorn.data.datasets.coco:46 - Done (t=0.01s) 2022-07-25 11:26:51 | INFO | pycocotools.coco:88 - creating index... 2022-07-25 11:26:51 | INFO | pycocotools.coco:88 - index created! 2022-07-25 11:26:51 | INFO | unicorn.core.trainer:319 - init prefetcher... 2022-07-25 11:26:57 | INFO | unicorn.data.datasets.coco:46 - loading annotations into memory... 2022-07-25 11:26:57 | INFO | unicorn.data.datasets.coco:46 - Done (t=0.00s) 2022-07-25 11:26:57 | INFO | pycocotools.coco:88 - creating index... 2022-07-25 11:26:57 | INFO | pycocotools.coco:88 - index created! 2022-07-25 11:26:59 | INFO | unicorn.core.trainer:363 - Training start... 2022-07-25 11:26:59 | INFO | unicorn.core.trainer:364 - Unicorn( (backbone): YOLOPAFPNNEW( (backbone): ConvNeXt( (downsample_layers): ModuleList( (0): Sequential( (0): Conv2d(3, 192, kernel_size=(4, 4), stride=(4, 4)) (1): LayerNorm() ) (1): Sequential( (0): LayerNorm() (1): Conv2d(192, 384, kernel_size=(2, 2), stride=(2, 2)) ) (2): Sequential( (0): LayerNorm() (1): Conv2d(384, 768, kernel_size=(2, 2), stride=(2, 2)) ) (3): Sequential( (0): LayerNorm() (1): Conv2d(768, 1536, kernel_size=(2, 2), stride=(2, 2)) ) ) (stages): ModuleList( (0): Sequential( (0): Block( (dwconv): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192) (norm): LayerNorm() (pwconv1): Linear(in_features=192, out_features=768, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=768, out_features=192, bias=True) (drop_path): Identity() ) (1): Block( (dwconv): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192) (norm): LayerNorm() (pwconv1): Linear(in_features=192, out_features=768, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=768, out_features=192, bias=True) (drop_path): DropPath(drop_prob=0.020) ) (2): Block( (dwconv): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192) (norm): LayerNorm() (pwconv1): Linear(in_features=192, out_features=768, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=768, out_features=192, bias=True) (drop_path): DropPath(drop_prob=0.040) ) ) (1): Sequential( (0): Block( (dwconv): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384) (norm): LayerNorm() (pwconv1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1536, out_features=384, bias=True) (drop_path): DropPath(drop_prob=0.060) ) (1): Block( (dwconv): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384) (norm): LayerNorm() (pwconv1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1536, out_features=384, bias=True) (drop_path): DropPath(drop_prob=0.080) ) (2): Block( (dwconv): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384) (norm): LayerNorm() (pwconv1): Linear(in_features=384, out_features=1536, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1536, out_features=384, bias=True) (drop_path): DropPath(drop_prob=0.100) ) ) (2): Sequential( (0): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.120) ) (1): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.140) ) (2): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.160) ) (3): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.180) ) (4): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.200) ) (5): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.220) ) (6): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.240) ) (7): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.260) ) (8): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.280) ) (9): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.300) ) (10): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.320) ) (11): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.340) ) (12): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.360) ) (13): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.380) ) (14): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.400) ) (15): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.420) ) (16): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.440) ) (17): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.460) ) (18): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.480) ) (19): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.500) ) (20): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.520) ) (21): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.540) ) (22): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.560) ) (23): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.580) ) (24): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.600) ) (25): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.620) ) (26): Block( (dwconv): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768) (norm): LayerNorm() (pwconv1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=3072, out_features=768, bias=True) (drop_path): DropPath(drop_prob=0.640) ) ) (3): Sequential( (0): Block( (dwconv): Conv2d(1536, 1536, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=1536) (norm): LayerNorm() (pwconv1): Linear(in_features=1536, out_features=6144, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=6144, out_features=1536, bias=True) (drop_path): DropPath(drop_prob=0.660) ) (1): Block( (dwconv): Conv2d(1536, 1536, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=1536) (norm): LayerNorm() (pwconv1): Linear(in_features=1536, out_features=6144, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=6144, out_features=1536, bias=True) (drop_path): DropPath(drop_prob=0.680) ) (2): Block( (dwconv): Conv2d(1536, 1536, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=1536) (norm): LayerNorm() (pwconv1): Linear(in_features=1536, out_features=6144, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=6144, out_features=1536, bias=True) (drop_path): DropPath(drop_prob=0.700) ) ) ) (norm1): LayerNorm() (norm2): LayerNorm() (norm3): LayerNorm() ) (upsample): Upsample(scale_factor=2.0, mode=nearest) (lateral_conv0): BaseConv( (conv): Conv2d(1536, 768, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (C3_p4): CSPLayer( (conv1): BaseConv( (conv): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv3): BaseConv( (conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (m): Sequential( (0): Bottleneck( (conv1): BaseConv( (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (1): Bottleneck( (conv1): BaseConv( (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (2): Bottleneck( (conv1): BaseConv( (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) ) ) (reduce_conv1): BaseConv( (conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (C3_p3): CSPLayer( (conv1): BaseConv( (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 192, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 192, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv3): BaseConv( (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (m): Sequential( (0): Bottleneck( (conv1): BaseConv( (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 192, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 192, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (1): Bottleneck( (conv1): BaseConv( (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 192, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 192, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (2): Bottleneck( (conv1): BaseConv( (conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 192, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 192, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) ) ) (bu_conv2): BaseConv( (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (C3_n3): CSPLayer( (conv1): BaseConv( (conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv3): BaseConv( (conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (m): Sequential( (0): Bottleneck( (conv1): BaseConv( (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (1): Bottleneck( (conv1): BaseConv( (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (2): Bottleneck( (conv1): BaseConv( (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 384, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) ) ) (bu_conv1): BaseConv( (conv): Conv2d(768, 768, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (C3_n4): CSPLayer( (conv1): BaseConv( (conv): Conv2d(1536, 768, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(1536, 768, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv3): BaseConv( (conv): Conv2d(1536, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 1536, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (m): Sequential( (0): Bottleneck( (conv1): BaseConv( (conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (1): Bottleneck( (conv1): BaseConv( (conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (2): Bottleneck( (conv1): BaseConv( (conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (conv2): BaseConv( (conv): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 768, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) ) ) ) (head): UnicornHead( (cls_convs): ModuleList( (0): Sequential( (0): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (1): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (2): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (3): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (1): Sequential( (0): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (1): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (2): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (3): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (2): Sequential( (0): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (1): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (2): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (3): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) ) (reg_convs): ModuleList( (0): Sequential( (0): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (1): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (2): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (3): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (1): Sequential( (0): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (1): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (2): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (3): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (2): Sequential( (0): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (1): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (2): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (3): BaseConv( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) ) (cls_preds): ModuleList( (0): Conv2d(256, 10, kernel_size=(1, 1), stride=(1, 1)) (1): Conv2d(256, 10, kernel_size=(1, 1), stride=(1, 1)) (2): Conv2d(256, 10, kernel_size=(1, 1), stride=(1, 1)) ) (reg_preds): ModuleList( (0): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1)) (1): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1)) (2): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1)) ) (obj_preds): ModuleList( (0): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) (1): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) (2): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) ) (cls_preds_sot): ModuleList( (0): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) (1): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) (2): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) ) (obj_preds_sot): ModuleList( (0): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) (1): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) (2): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) ) (reg_preds_sot): ModuleList( (0): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1)) (1): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1)) (2): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1)) ) (stems): ModuleList( (0): BaseConv( (conv): Conv2d(384, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (1): BaseConv( (conv): Conv2d(768, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) (2): BaseConv( (conv): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): GroupNorm(16, 256, eps=0.001, affine=True) (act): SiLU(inplace=True) ) ) (att_layers): ModuleList( (0): ModuleList( (0): Block( (dwconv): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=256) (norm): LayerNorm() (pwconv1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1024, out_features=256, bias=True) (drop_path): Identity() ) (1): Block( (dwconv): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=256) (norm): LayerNorm() (pwconv1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1024, out_features=256, bias=True) (drop_path): Identity() ) (2): Block( (dwconv): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=256) (norm): LayerNorm() (pwconv1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1024, out_features=256, bias=True) (drop_path): Identity() ) ) (1): ModuleList( (0): Block( (dwconv): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=256) (norm): LayerNorm() (pwconv1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1024, out_features=256, bias=True) (drop_path): Identity() ) (1): Block( (dwconv): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=256) (norm): LayerNorm() (pwconv1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1024, out_features=256, bias=True) (drop_path): Identity() ) (2): Block( (dwconv): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=256) (norm): LayerNorm() (pwconv1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1024, out_features=256, bias=True) (drop_path): Identity() ) ) (2): ModuleList( (0): Block( (dwconv): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=256) (norm): LayerNorm() (pwconv1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1024, out_features=256, bias=True) (drop_path): Identity() ) (1): Block( (dwconv): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=256) (norm): LayerNorm() (pwconv1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1024, out_features=256, bias=True) (drop_path): Identity() ) (2): Block( (dwconv): Conv2d(256, 256, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=256) (norm): LayerNorm() (pwconv1): Linear(in_features=256, out_features=1024, bias=True) (act): GELU(approximate=none) (pwconv2): Linear(in_features=1024, out_features=256, bias=True) (drop_path): Identity() ) ) ) (l1_loss): L1Loss() (bcewithlog_loss): BCEWithLogitsLoss() (iou_loss): IOUloss() ) (bottleneck): Sequential( (0): Conv2d(768, 256, kernel_size=(1, 1), stride=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) ) (upsample_layer): Sequential( (0): PixelShuffle(upscale_factor=2) (1): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): ReLU() (3): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (pos_emb): PositionEmbeddingLearned( (row_embed): Embedding(40, 128) (col_embed): Embedding(40, 128) ) (transformer): DeformableTransformer( (encoder): DeformableTransformerEncoder( (layers): ModuleList( (0): DeformableTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=128, bias=True) (attention_weights): Linear(in_features=256, out_features=64, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) ) ) ) 2022-07-25 11:26:59 | INFO | unicorn.core.trainer:378 - ---> start train epoch1 /usr/local/lib/python3.7/dist-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 2022-07-25 11:27:02 | ERROR | unicorn.core.launch:98 - An error has been caught in function 'launch', process 'MainProcess' (2320), thread 'MainThread' (140427802044288): Traceback (most recent call last):

File "tools/train.py", line 132, in args=(exp, args), │ └ Namespace(batch_size=4, cache=False, ckpt=None, devices=1, dist_backend='nccl', dist_url=None, exp_file='/content/Unicorn/exp... └ ╒═══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...

File "/content/Unicorn/unicorn/core/launch.py", line 98, in launch main_func(*args) │ └ (╒═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════... └ <function main at 0x7fb7e2a048c0>

File "tools/train.py", line 110, in main trainer.train() │ └ <function Trainer.train at 0x7fb743e40e60> └ <unicorn.core.trainer.Trainer object at 0x7fb7329fd710>

File "/content/Unicorn/unicorn/core/trainer.py", line 94, in train self.train_in_epoch() │ └ <function Trainer.train_in_epoch at 0x7fb732b26c20> └ <unicorn.core.trainer.Trainer object at 0x7fb7329fd710>

File "/content/Unicorn/unicorn/core/trainer.py", line 149, in train_in_epoch self.train_in_iter() │ └ <function Trainer.train_in_iter at 0x7fb732b26cb0> └ <unicorn.core.trainer.Trainer object at 0x7fb7329fd710>

File "/content/Unicorn/unicorn/core/trainer.py", line 155, in train_in_iter self.train_one_iter() │ └ <function Trainer.train_one_iter at 0x7fb732b277a0> └ <unicorn.core.trainer.Trainer object at 0x7fb7329fd710>

File "/content/Unicorn/unicorn/core/trainer.py", line 194, in train_one_iter outputs = self.model(inps, targets) │ │ │ └ tensor([[[ 0.0000, 456.5000, 201.3750, 29.3594, 76.5000], │ │ │ [ 0.0000, 371.7500, 225.0000, 34.8750, 83.... │ │ └ tensor([[[[124., 124., 124., ..., 53., 53., 53.], │ │ [124., 124., 124., ..., 53., 53., 53.], │ │ [124., ... │ └ Unicorn( │ (backbone): YOLOPAFPNNEW( │ (backbone): ConvNeXt( │ (downsample_layers): ModuleList( │ (0): Sequential... └ <unicorn.core.trainer.Trainer object at 0x7fb7329fd710>

File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) │ │ └ {} │ └ (tensor([[[[124., 124., 124., ..., 53., 53., 53.], │ [124., 124., 124., ..., 53., 53., 53.], │ [124.,... └ <bound method Unicorn.forward of Unicorn( (backbone): YOLOPAFPNNEW( (backbone): ConvNeXt( (downsample_layers): Mo...

File "/content/Unicorn/unicorn/models/unicorn.py", line 139, in forward return self.head(fpn_outs, pred_lbs1_ms, mode="mot"), seq_dict │ │ │ └ {'feat': tensor([[[[-3.5236e-03, 6.1762e-05, 2.0314e-02, ..., 2.3803e-02, │ │ │ 2.2658e-02, 2.5742e-02], │ │ │ ... │ │ └ (tensor([[[[0., 0., 0., ..., 0., 0., 0.], │ │ [0., 0., 0., ..., 0., 0., 0.], │ │ [0., 0., 0., ..., 0., 0., 0.]... │ └ (tensor([[[[-1.2437e-01, -1.0452e-01, -1.0893e-01, ..., -6.0160e-02, │ 4.3569e-02, -1.2673e-01], │ [-1.184... └ Unicorn( (backbone): YOLOPAFPNNEW( (backbone): ConvNeXt( (downsample_layers): ModuleList( (0): Sequential...

File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) │ │ └ {'mode': 'mot'} │ └ ((tensor([[[[-1.2437e-01, -1.0452e-01, -1.0893e-01, ..., -6.0160e-02, │ 4.3569e-02, -1.2673e-01], │ [-1.18... └ <bound method UnicornHead.forward of UnicornHead( (cls_convs): ModuleList( (0): Sequential( (0): BaseConv( ...

File "/content/Unicorn/unicorn/models/unicorn_head.py", line 361, in forward return_ota=return_ota └ False

File "/content/Unicorn/unicorn/models/unicorn_head.py", line 503, in get_losses nlabel = (labels.sum(dim=2) > 0).sum(dim=1) # number of objects └ None

AttributeError: 'NoneType' object has no attribute 'sum'

Here it is

train_log.txt

@MasterBin-IIAU

MasterBin-IIAU commented 2 years ago

@AhmedKhaled945 Hi, I have checked the train_log.txt. I found two problems. (1) self.task in your provided log is "det". But during the training of object tracking, self.task should be "uni". unicorn_track_large_mot_challenge.py inherits unicorn_track.py. In unicorn_track.py, self.task has been set to "uni". https://github.com/MasterBin-IIAU/Unicorn/blob/c60fceb2181e12ea487900d3b0991d46eae54d43/unicorn/exp/unicorn_track.py#L33 (2) if you only need to train a model for MOT, you should set self.mot_only=True, and self.mot_weight=1. You can compare unicorn_track_tiny.py and unicorn_track_tiny_mot_only.py for better understanding.

AhmedKhaled945 commented 2 years ago

Thanks a lot, that makes sense.