alanzty / MO3TR

An official implementation of Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers
MIT License
8 stars 2 forks source link

FileNotFoundError: MO3TRDataset: [Errno 2] No such file or directory: 'MOT17/annotations/half-train-SDP_cocoformat.json' #3

Open norah251 opened 1 year ago

norah251 commented 1 year ago

Hi. Thanks very much for the good work. I was trying to run >python run/train_track_nf.py and encountered this error

fatal: not a git repository (or any parent up to mount point /content)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
2023-02-15 21:47:34,585 - mmtrack - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
CUDA available: True
GPU 0: Tesla T4
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.6.r11.6/compiler.31057947_0
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.7.1
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.2
OpenCV: 4.6.0
MMCV: 1.4.2
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMTracking: 0.8.0+
------------------------------------------------------------

2023-02-15 21:47:34,585 - mmtrack - INFO - Distributed training: False
2023-02-15 21:47:35,280 - mmtrack - INFO - Config:
total_epochs = 20
load_from = ''
fp_rate = 0.5
dup_rate = 0
fpdb_rate = 0.5
grad = 'separate'
bs = 1
num_workers = 0
frame_range = 3
num_ref_imgs = 5
noise = 0
root_work = '/storage/alan/workspace/mmStorage/mot/'
work_dir = '/storage/alan/workspace/mmStorage/mot/mo3tr_temphs_fr5_randseq'
img_scale = (800, 1440)
optimizer = dict(
    type='AdamW',
    lr=2e-05,
    weight_decay=0.0001,
    paramwise_cfg=dict(
        custom_keys=dict(
            backbone=dict(lr_mult=0.1),
            sampling_offsets=dict(lr_mult=0.1),
            reference_points=dict(lr_mult=0.1))))
optimizer_config = dict(grad_clip=dict(max_norm=0.1, norm_type=2))
lr_config = dict(policy='step', step=[10])
runner = dict(type='EpochBasedRunner', max_epochs=20)
model = dict(
    detector=dict(
        type='YOLOX',
        input_size=(800, 1440),
        random_size_range=(18, 32),
        random_size_interval=10,
        backbone=dict(
            type='CSPDarknet',
            deepen_factor=1.33,
            widen_factor=1.25,
            frozen_stages=4),
        neck=dict(
            type='YOLOXPAFPN',
            in_channels=[320, 640, 1280],
            out_channels=320,
            num_csp_blocks=4,
            freeze=True),
        bbox_head=dict(
            type='Mo3trDetrHead',
            num_query=300,
            num_classes=1,
            in_channels=320,
            sync_cls_avg_factor=True,
            with_box_refine=True,
            as_two_stage=False,
            transformer=dict(
                type='MO3TRTransformer',
                sa=False,
                encoder=dict(
                    type='DetrTransformerEncoder',
                    num_layers=1,
                    transformerlayers=dict(
                        type='BaseTransformerLayer',
                        attn_cfgs=dict(
                            type='MultiScaleDeformableAttention',
                            embed_dims=320,
                            num_levels=3),
                        feedforward_channels=1024,
                        ffn_cfgs=dict(
                            type='FFN',
                            embed_dims=320,
                            feedforward_channels=1024,
                            num_fcs=2,
                            ffn_drop=0.0,
                            act_cfg=dict(type='ReLU', inplace=True)),
                        ffn_dropout=0.1,
                        operation_order=('self_attn', 'norm', 'ffn', 'norm'))),
                decoder=dict(
                    type='DeformableDetrTransformerDecoder',
                    num_layers=6,
                    return_intermediate=True,
                    transformerlayers=dict(
                        type='DetrTransformerDecoderLayer',
                        attn_cfgs=[
                            dict(
                                type='MultiheadAttention',
                                embed_dims=320,
                                num_heads=8,
                                dropout=0.1),
                            dict(
                                type='MultiScaleDeformableAttention',
                                embed_dims=320,
                                num_levels=3)
                        ],
                        feedforward_channels=1024,
                        ffn_cfgs=dict(
                            type='FFN',
                            embed_dims=320,
                            feedforward_channels=1024,
                            num_fcs=2,
                            ffn_drop=0.0,
                            act_cfg=dict(type='ReLU', inplace=True)),
                        ffn_dropout=0.1,
                        operation_order=('self_attn', 'norm', 'cross_attn',
                                         'norm', 'ffn', 'norm')))),
            positional_encoding=dict(
                type='SinePositionalEncoding',
                num_feats=160,
                normalize=True,
                offset=-0.5),
            loss_cls=dict(
                type='FocalLoss',
                use_sigmoid=True,
                gamma=2.0,
                alpha=0.25,
                loss_weight=2.0),
            loss_bbox=dict(type='L1Loss', loss_weight=5.0),
            loss_iou=dict(type='GIoULoss', loss_weight=2.0)),
        train_cfg=dict(
            assigner=dict(
                type='HungarianAssignerMO3TR',
                cls_cost=dict(type='FocalLossCost', weight=2.0),
                reg_cost=dict(
                    type='BBoxL1Cost', weight=5.0, box_format='xywh'),
                iou_cost=dict(type='IoUCost', iou_mode='giou', weight=2.0))),
        test_cfg=dict(max_per_img=100)),
    type='MO3TRnF',
    tracker=dict(
        type='Mo3trTracker',
        init_track_thr=0.5,
        prop_thr=0.5,
        num_frames_retain=1),
    fp_rate=0.5,
    dup_rate=0,
    noise=0,
    fpdb_rate=0.5,
    grad='separate')
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadMultiImagesFromFile', to_float32=True),
    dict(type='SeqLoadAnnotations', with_bbox=True, with_track=True),
    dict(
        type='SeqResize',
        img_scale=(800, 1440),
        share_params=True,
        keep_ratio=True,
        bbox_clip_border=False),
    dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5),
    dict(
        type='SeqPad',
        size_divisor=32,
        pad_val=dict(img=(114.0, 114.0, 114.0))),
    dict(type='MatchInstancesMO3TR', skip_nomatch=True),
    dict(
        type='VideoCollect',
        keys=[
            'img', 'gt_bboxes', 'gt_labels', 'gt_match_indices',
            'gt_instance_ids'
        ]),
    dict(type='SeqDefaultFormatBundleMO3TR')
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(800, 1440),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Pad',
                size_divisor=32,
                pad_val=dict(img=(114.0, 114.0, 114.0))),
            dict(type='ImageToFloatTensor', keys=['img']),
            dict(type='VideoCollect', keys=['img'])
        ])
]
data_root = 'MOT17/'
data = dict(
    samples_per_gpu=1,
    workers_per_gpu=0,
    persistent_workers=False,
    val=dict(
        type='MO3TRDataset',
        ann_file='MOT17/annotations/half-val-SDP_cocoformat.json',
        img_prefix='MOT17/train',
        ref_img_sampler=None,
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(800, 1440),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Pad',
                        size_divisor=32,
                        pad_val=dict(img=(114.0, 114.0, 114.0))),
                    dict(type='ImageToFloatTensor', keys=['img']),
                    dict(type='VideoCollect', keys=['img'])
                ])
        ],
        interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)),
    test=dict(
        type='MO3TRDataset',
        ann_file='MOT17/annotations/half-val-SDP_cocoformat.json',
        img_prefix='MOT17/train',
        ref_img_sampler=None,
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(800, 1440),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Pad',
                        size_divisor=32,
                        pad_val=dict(img=(114.0, 114.0, 114.0))),
                    dict(type='ImageToFloatTensor', keys=['img']),
                    dict(type='VideoCollect', keys=['img'])
                ])
        ],
        interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)),
    train=dict(
        type='MO3TRDataset',
        visibility_thr=-1,
        ann_file='MOT17/annotations/half-train-SDP_cocoformat.json',
        img_prefix='MOT17/train',
        ref_img_sampler=dict(
            num_ref_imgs=5,
            frame_range=3,
            filter_key_img=True,
            method='uniform'),
        pipeline=[
            dict(type='LoadMultiImagesFromFile', to_float32=True),
            dict(type='SeqLoadAnnotations', with_bbox=True, with_track=True),
            dict(
                type='SeqResize',
                img_scale=(800, 1440),
                share_params=True,
                keep_ratio=True,
                bbox_clip_border=False),
            dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5),
            dict(
                type='SeqPad',
                size_divisor=32,
                pad_val=dict(img=(114.0, 114.0, 114.0))),
            dict(type='MatchInstancesMO3TR', skip_nomatch=True),
            dict(
                type='VideoCollect',
                keys=[
                    'img', 'gt_bboxes', 'gt_labels', 'gt_match_indices',
                    'gt_instance_ids'
                ]),
            dict(type='SeqDefaultFormatBundleMO3TR')
        ]))
checkpoint_config = dict(interval=1)
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [
    dict(type='SyncNormHook', num_last_epochs=15, interval=5, priority=48),
    dict(
        type='ExpMomentumEMAHook',
        resume_from=None,
        momentum=0.0001,
        priority=49)
]
dist_params = dict(backend='nccl')
log_level = 'INFO'
resume_from = ''
workflow = [('train', 1)]
evaluation = dict(metric=['bbox', 'track'], interval=1)
search_metrics = ['MOTA', 'IDF1', 'FN', 'FP', 'IDs', 'MT', 'ML']
gpu_ids = range(0, 1)

2023-02-15 21:47:35,372 - mmtrack - INFO - Set random seed to 2015075619, deterministic: False
/usr/local/lib/python3.8/dist-packages/mmcv/ops/multi_scale_deform_attn.py:209: UserWarning: You'd better set embed_dims in MultiScaleDeformAttention to make the dimension of each attention head a power of 2 which is more efficient in our CUDA implementation.
  warnings.warn(
2023-02-15 21:47:37,415 - mmtrack - INFO - initialize CSPDarknet with init_cfg {'type': 'Kaiming', 'layer': 'Conv2d', 'a': 2.23606797749979, 'distribution': 'uniform', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu'}
2023-02-15 21:47:37,816 - mmtrack - INFO - initialize YOLOXPAFPN with init_cfg {'type': 'Kaiming', 'layer': 'Conv2d', 'a': 2.23606797749979, 'distribution': 'uniform', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu'}
loading annotations into memory...
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
    return obj_cls(**args)
  File "/usr/local/lib/python3.8/dist-packages/mmtrack/datasets/mot_challenge_dataset.py", line 42, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmtrack/datasets/coco_video_dataset.py", line 46, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmdet/datasets/custom.py", line 92, in __init__
    self.data_infos = self.load_annotations(local_path)
  File "/usr/local/lib/python3.8/dist-packages/mmtrack/datasets/coco_video_dataset.py", line 61, in load_annotations
    data_infos = self.load_video_anns(ann_file)
  File "/usr/local/lib/python3.8/dist-packages/mmtrack/datasets/coco_video_dataset.py", line 73, in load_video_anns
    self.coco = CocoVID(ann_file)
  File "/usr/local/lib/python3.8/dist-packages/mmtrack/datasets/parsers/coco_video_parser.py", line 22, in __init__
    super(CocoVID, self).__init__(annotation_file=annotation_file)
  File "/usr/local/lib/python3.8/dist-packages/mmdet/datasets/api_wrappers/coco_api.py", line 23, in __init__
    super().__init__(annotation_file=annotation_file)
  File "/usr/local/lib/python3.8/dist-packages/pycocotools/coco.py", line 81, in __init__
    with open(annotation_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'MOT17/annotations/half-train-SDP_cocoformat.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/drive/MyDrive/MO3TR-main/run/train_track_nf.py", line 187, in <module>
    main()
  File "/content/drive/MyDrive/MO3TR-main/run/train_track_nf.py", line 162, in main
    datasets = [build_dataset(cfg.data.train)]
  File "/usr/local/lib/python3.8/dist-packages/mmdet/datasets/builder.py", line 81, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
FileNotFoundError: MO3TRDataset: [Errno 2] No such file or directory: 'MOT17/annotations/half-train-SDP_cocoformat.json'
norah251 commented 1 year ago

@sieumap43

alanzty commented 1 year ago

Hi there, this work is built on top of mmtrack. https://github.com/open-mmlab/mmtracking

In order to run, you need to provide it with the dataset and json file. To know how these processing works, it is recommended to follow mmtrack.

norah251 commented 1 year ago

@alanzty Is it possible to explain more ? Could you please tell me which file should I look at ? Thanks!