CaptainEven / MCMOT

Real time one-stage multi-class & multi-object tracking based on anchor-free detection and ReID
MIT License
383 stars 82 forks source link

用demo.py在visdrone上运行 #43

Closed syx-THU closed 3 years ago

syx-THU commented 3 years ago

您好,非常感谢您的工作! 我在运行将您的demo.py用于visdrone时遇到了一些问题:

  1. visdrone数据集只提供了frames,我用24fps将frames合并成了视频,格式为avi,这样能否用于demo.py的video mode?
  2. 我用1生成的视频以及您提供的mcmot_last_track_resdcn_18_visdrone.pth模型运行demo,具体opt.py如下: `self.parser.add_argument('--task', default='mot', help='mot') self.parser.add_argument('--dataset', default='visdrone', help='jde') self.parser.add_argument('--exp_id', default='default') self.parser.add_argument('--test', action='store_true') self.parser.add_argument('--load_model', default='/home/suyx/MCMOT/models/mcmot_last_track_resdcn_18_visdrone.pth', help='path to pretrained model') self.parser.add_argument('--resume', action='store_true', help='resume an experiment. ' 'Reloaded the optimizer parameter and ' 'set load_model to model_last.pth ' 'in the exp dir if load_model is empty.')

    # system
    self.parser.add_argument('--gpus',
                             default='0',  # 0, 5, 6
                             help='-1 for CPU, use comma for multiple gpus')
    self.parser.add_argument('--num_workers',
                             type=int,
                             default=8,  # 8, 6, 4
                             help='dataloader threads. 0 for single-thread.')
    self.parser.add_argument('--not_cuda_benchmark', action='store_true',
                             help='disable when the input size is not fixed.')
    self.parser.add_argument('--seed', type=int, default=317,
                             help='random seed')  # from CornerNet
    self.parser.add_argument('--gen-scale',
                             type=bool,
                             default=True,
                             help='Whether to generate multi-scales')
    self.parser.add_argument('--is_debug',
                             type=bool,
                             default=False,  # 是否使用多线程加载数据, default: False
                             help='whether in debug mode or not')  # debug模式下只能使用单进程
    
    # log
    self.parser.add_argument('--print_iter', type=int, default=0,
                             help='disable progress bar and print to screen.')
    self.parser.add_argument('--hide_data_time', action='store_true',
                             help='not display time during training.')
    self.parser.add_argument('--save_all', action='store_true',
                             help='save model to disk every 5 epochs.')
    self.parser.add_argument('--metric', default='loss',
                             help='main metric to save best model')
    self.parser.add_argument('--vis_thresh', type=float, default=0.5,
                             help='visualization threshold.')
    # model: backbone and so on...
    self.parser.add_argument('--arch',
                             default='resdcn_18',
                             help='model architecture. Currently tested'
                                  'resdcn_18 |resdcn_34 | resdcn_50 | resfpndcn_34 |'
                                  'dla_34 | hrnet_32 | hrnet_18 | cspdarknet_53')
    self.parser.add_argument('--head_conv',
                             type=int,
                             default=-1,
                             help='conv layer channels for output head'
                                  '0 for no conv layer'
                                  '-1 for default setting: '
                                  '256 for resnets and 256 for dla.')
    self.parser.add_argument('--down_ratio',
                             type=int,
                             default=4,  # 输出特征图的下采样率 H=H_image/4 and W=W_image/4
                             help='output stride. Currently only supports 4.')
    
    # input
    self.parser.add_argument('--input_res',
                             type=int,
                             default=-1,
                             help='input height and width. -1 for default from '
                                  'dataset. Will be overriden by input_h | input_w')
    self.parser.add_argument('--input_h',
                             type=int,
                             default=-1,
                             help='input height. -1 for default from dataset.')
    self.parser.add_argument('--input_w',
                             type=int,
                             default=-1,
                             help='input width. -1 for default from dataset.')
    
    # train
    self.parser.add_argument('--lr',
                             type=float,
                             default=7e-5,  # 1e-4, 7e-5, 5e-5, 3e-5
                             help='learning rate for batch size 32.')
    self.parser.add_argument('--lr_step',
                             type=str,
                             default='10,20',  # 20,27
                             help='drop learning rate by 10.')
    self.parser.add_argument('--num_epochs',
                             type=int,
                             default=30,  # 30, 10, 3, 1
                             help='total training epochs.')
    self.parser.add_argument('--batch-size',
                             type=int,
                             default=10,  # 18, 16, 14, 12, 10, 8, 4
                             help='batch size')
    self.parser.add_argument('--master_batch_size', type=int, default=-1,
                             help='batch size on the master gpu.')
    self.parser.add_argument('--num_iters', type=int, default=-1,
                             help='default: #samples / batch_size.')
    self.parser.add_argument('--val_intervals', type=int, default=10,
                             help='number of epochs to run validation.')
    self.parser.add_argument('--trainval',
                             action='store_true',
                             help='include validation in training and '
                                  'test on test set')
    
    # test
    self.parser.add_argument('--K',
                             type=int,
                             default=200,  # 128
                             help='max number of output objects.')  # 一张图输出检测目标最大数量
    self.parser.add_argument('--not_prefetch_test',
                             action='store_true',
                             help='not use parallal data pre-processing.')
    self.parser.add_argument('--fix_res',
                             action='store_true',
                             help='fix testing resolution or keep '
                                  'the original resolution')
    self.parser.add_argument('--keep_res',
                             action='store_true',
                             help='keep the original resolution'
                                  ' during validation.')
    # tracking
    self.parser.add_argument(
        '--test_mot16', default=False, help='test mot16')
    self.parser.add_argument(
        '--val_mot15', default=False, help='val mot15')
    self.parser.add_argument(
        '--test_mot15', default=False, help='test mot15')
    self.parser.add_argument(
        '--val_mot16', default=False, help='val mot16 or mot15')
    self.parser.add_argument(
        '--test_mot17', default=False, help='test mot17')
    self.parser.add_argument(
        '--val_mot17', default=False, help='val mot17')
    self.parser.add_argument(
        '--val_mot20', default=False, help='val mot20')
    self.parser.add_argument(
        '--test_mot20', default=False, help='test mot20')
    self.parser.add_argument(
        '--conf_thres',
        type=float,
        default=0.4,  # 0.6, 0.4
        help='confidence thresh for tracking')  # heat-map置信度阈值
    self.parser.add_argument('--det_thres',
                             type=float,
                             default=0.3,
                             help='confidence thresh for detection')
    self.parser.add_argument('--nms_thres',
                             type=float,
                             default=0.4,
                             help='iou thresh for nms')
    self.parser.add_argument('--track_buffer',
                             type=int,
                             default=30,  # 30
                             help='tracking buffer')
    self.parser.add_argument('--min-box-area',
                             type=float,
                             default=200,
                             help='filter out tiny boxes')
    
    # 测试阶段的输入数据模式: video or image dir
    self.parser.add_argument('--input-mode',
                             type=str,
                             default='video',  # video or image_dir or img_path_list_txt
                             help='input data type(video or image dir)')
    
    # 输入的video文件路径
    self.parser.add_argument('--input-video',
                             type=str,
                             default='/home/suyx/MCMOT/dataset/val/VisDrone-val/videos/uav0000086_00000_v.avi',
                             help='path to the input video')
    
    # 输入的image目录
    self.parser.add_argument('--input-img',
                             type=str,
                             default='/users/duanyou/c5/all_pretrain/test.txt',  # ../images/
                             help='path to the input image directory or image file list(.txt)')
    
    self.parser.add_argument('--output-format',
                             type=str,
                             default='video',
                             help='video or text')
    self.parser.add_argument('--output-root',
                             type=str,
                             default='../results',
                             help='expected output root path')
    # mot: 选择数据集的配置文件
    self.parser.add_argument('--data_cfg', type=str,
                             default='../src/lib/cfg/visdrone.json',  # 'mot15.json', 'visdrone.json'
                             help='load data from cfg')
    # self.parser.add_argument('--data_cfg', type=str,
    #                          default='../src/lib/cfg/mcmot_det.json',  # mcmot.json, mcmot_det.json,
    #                          help='load data from cfg')
    self.parser.add_argument('--data_dir',
                             type=str,
                             default='/home/suyx/MCMOT/dataset')
    
    # loss
    self.parser.add_argument('--mse_loss',  # default: false
                             action='store_true',
                             help='use mse loss or focal loss to train '
                                  'keypoint heatmaps.')
    self.parser.add_argument('--reg_loss',
                             default='l1',
                             help='regression loss: sl1 | l1 | l2')  # sl1: smooth L1 loss
    self.parser.add_argument('--hm_weight',
                             type=float,
                             default=1,
                             help='loss weight for keypoint heatmaps.')
    self.parser.add_argument('--off_weight',
                             type=float,
                             default=1,
                             help='loss weight for keypoint local offsets.')
    self.parser.add_argument('--wh_weight',
                             type=float,
                             default=0.1,
                             help='loss weight for bounding box size.')
    self.parser.add_argument('--id_loss',
                             default='ce',
                             help='reid loss: ce | triplet')
    self.parser.add_argument('--id_weight',
                             type=float,
                             default=1,  # 0for detection only and 1 for detection and re-ida
                             help='loss weight for id')  # ReID feature extraction or not
    self.parser.add_argument('--reid_dim',
                             type=int,
                             default=128,  # 128, 256, 512
                             help='feature dim for reid')
    self.parser.add_argument('--input-wh',
                             type=tuple,
                             default=(1088, 608),  # (768, 448) or (1088, 608)
                             help='net input resplution')
    self.parser.add_argument('--multi-scale',
                             type=bool,
                             default=True,
                             help='Whether to use multi-scale training or not')
    # ----------------------1~10 object classes are what we need
    # pedestrian      (1),  --> 0
    # people          (2),  --> 1
    # bicycle         (3),  --> 2
    # car             (4),  --> 3
    # van             (5),  --> 4
    # truck           (6),  --> 5
    # tricycle        (7),  --> 6
    # awning-tricycle (8),  --> 7
    # bus             (9),  --> 8
    # motor           (10), --> 9
    # ----------------------
    
    # others          (11)
    self.parser.add_argument('--reid_cls_ids',
                             default='0,1,2,3,4,5,6,7,8,9',  # '0,1,2,3,4' or '0,1,2,3,4,5,6,7,8,9'
                             help='')  # the object classes need to do reid
    
    self.parser.add_argument('--norm_wh', action='store_true',
                             help='L1(\hat(y) / y, 1) or L1(\hat(y), y)')
    self.parser.add_argument('--dense_wh', action='store_true',
                             help='apply weighted regression near center or '
                                  'just apply regression on center point.')
    self.parser.add_argument('--cat_spec_wh',
                             action='store_true',
                             help='category specific bounding box size.')
    self.parser.add_argument('--not_reg_offset',
                             action='store_true',
                             help='not regress local offset.')

    ` 运行时log上看应该是没问题的: Fix size testing. training chunk_sizes: [10] The output will be saved to /home/suyx/MCMOT/src/lib/../../exp/mot/default Net input image size: 1088×608 heads: {'hm': 10, 'wh': 2, 'id': 128, 'reg': 2} 2020-10-21 11:17:03 [INFO]: Starting tracking... 2020-10-21 11:17:03 [INFO]: Starting tracking... Lenth of the video: 464 frames Creating model... loaded /home/suyx/MCMOT/models/mcmot_last_track_resdcn_18_visdrone.pth, epoch 292020-10-21 11:17:11 [INFO]: Processing frame 0 (100000.00 fps) 2020-10-21 11:17:11 [INFO]: Processing frame 0 (100000.00 fps) 2020-10-21 11:17:14 [INFO]: Processing frame 20 (10.23 fps) 2020-10-21 11:17:14 [INFO]: Processing frame 20 (10.23 fps) 2020-10-21 11:17:17 [INFO]: Processing frame 40 (13.34 fps) 2020-10-21 11:17:17 [INFO]: Processing frame 40 (13.34 fps) 2020-10-21 11:17:20 [INFO]: Processing frame 60 (14.66 fps) 2020-10-21 11:17:20 [INFO]: Processing frame 60 (14.66 fps) 2020-10-21 11:17:23 [INFO]: Processing frame 80 (15.30 fps) 2020-10-21 11:17:23 [INFO]: Processing frame 80 (15.30 fps) 2020-10-21 11:17:25 [INFO]: Processing frame 100 (16.10 fps) 2020-10-21 11:17:25 [INFO]: Processing frame 100 (16.10 fps) 2020-10-21 11:17:28 [INFO]: Processing frame 120 (16.51 fps) 2020-10-21 11:17:28 [INFO]: Processing frame 120 (16.51 fps) 2020-10-21 11:17:31 [INFO]: Processing frame 140 (16.74 fps) 2020-10-21 11:17:31 [INFO]: Processing frame 140 (16.74 fps) 2020-10-21 11:17:34 [INFO]: Processing frame 160 (16.91 fps) 2020-10-21 11:17:34 [INFO]: Processing frame 160 (16.91 fps) 2020-10-21 11:17:36 [INFO]: Processing frame 180 (17.06 fps) 2020-10-21 11:17:36 [INFO]: Processing frame 180 (17.06 fps) 2020-10-21 11:17:39 [INFO]: Processing frame 200 (17.27 fps) 2020-10-21 11:17:39 [INFO]: Processing frame 200 (17.27 fps) 2020-10-21 11:17:42 [INFO]: Processing frame 220 (17.27 fps) 2020-10-21 11:17:42 [INFO]: Processing frame 220 (17.27 fps) 2020-10-21 11:17:44 [INFO]: Processing frame 240 (17.35 fps) 2020-10-21 11:17:44 [INFO]: Processing frame 240 (17.35 fps) 2020-10-21 11:17:47 [INFO]: Processing frame 260 (17.52 fps) 2020-10-21 11:17:47 [INFO]: Processing frame 260 (17.52 fps) 2020-10-21 11:17:49 [INFO]: Processing frame 280 (17.59 fps) 2020-10-21 11:17:49 [INFO]: Processing frame 280 (17.59 fps) 2020-10-21 11:17:51 [INFO]: Processing frame 300 (17.69 fps)
    2020-10-21 11:17:51 [INFO]: Processing frame 300 (17.69 fps)
    2020-10-21 11:17:54 [INFO]: Processing frame 320 (17.81 fps)
    2020-10-21 11:17:54 [INFO]: Processing frame 320 (17.81 fps)
    2020-10-21 11:17:56 [INFO]: Processing frame 340 (17.85 fps)
    2020-10-21 11:17:56 [INFO]: Processing frame 340 (17.85 fps)
    2020-10-21 11:17:59 [INFO]: Processing frame 360 (17.92 fps) 2020-10-21 11:17:59 [INFO]: Processing frame 360 (17.92 fps)
    2020-10-21 11:18:03 [INFO]: Processing frame 380 (17.75 fps) 2020-10-21 11:18:03 [INFO]: Processing frame 380 (17.75 fps) 2020-10-21 11:18:06 [INFO]: Processing frame 400 (17.79 fps) 2020-10-21 11:18:06 [INFO]: Processing frame 400 (17.79 fps) 2020-10-21 11:18:09 [INFO]: Processing frame 420 (17.82 fps) 2020-10-21 11:18:09 [INFO]: Processing frame 420 (17.82 fps) 2020-10-21 11:18:12 [INFO]: Processing frame 440 (17.83 fps) 2020-10-21 11:18:12 [INFO]: Processing frame 440 (17.83 fps)
    2020-10-21 11:18:14 [INFO]: Processing frame 460 (17.85 fps) 2020-10-21 11:18:14 [INFO]: Processing frame 460 (17.85 fps) 2020-10-21 11:18:15 [INFO]: save results to ../results/results.txt 2020-10-21 11:18:15 [INFO]: save results to ../results/results.txt ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0) configuration: --prefix=/home/suyx/miniconda3/envs/FairMOT --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame libavutil 56. 51.100 / 56. 51.100
    libavcodec 58. 91.100 / 58. 91.100
    libavformat 58. 45.100 / 58. 45.100
    libavdevice 58. 10.100 / 58. 10.100
    libavfilter 7. 85.100 / 7. 85.100
    libavresample 4. 0. 0 / 4. 0. 0
    libswscale 5. 7.100 / 5. 7.100
    libswresample 3. 7.100 / 3. 7.100
    Input #0, image2, from '../results/frame/%05d.jpg':
    Duration: 00:00:18.52, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 1920x1080 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 25 tbn, 25 tbc Please use -b:a or -b:v, -b is ambiguous
    Stream mapping:
    Stream #0:0 -> #0:0 (mjpeg (native) -> mpeg4 (native))
    Press [q] to stop, [?] for help
    [swscaler @ 0x55b903555480] deprecated pixel format used, make sure you did set range correctly Output #0, mp4, to '../results/uav0000086_00000_v_track.mp4': Metadata:
    encoder : Lavf58.45.100
    Stream #0:0: Video: mpeg4 (mp4v / 0x7634706D), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 5000 kb/s, 25 fps, 12800 tbn, 25 tbc Metadata:
    encoder : Lavc58.91.100 mpeg4
    Side data:
    cpb: bitrate max/min/avg: 0/0/5000000 buffer size: 0 vbv_delay: N/A frame= 463 fps= 38 q=9.5 Lsize= 11457kB time=00:00:18.48 bitrate=5078.8kbits/s speed= 1.5x video:11454kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.026063%

但是最终得到的tracking后的视频里没有检测框,有一些帧里会有几个黑色区域 uav0000086_00000_v_track mp4_20201021_115020 538

请问您能说一下您在visdrone上跑跟踪时的做法吗?非常感谢!

CaptainEven commented 3 years ago

uav0000086_track

CaptainEven commented 3 years ago

@syx-THU 跟你的做法差不多,使用ffmpeg压成mp4,然后以video模式跟踪测试。

CaptainEven commented 3 years ago

我的配置文件: self.parser.add_argument('--task', default='mot', help='mot') self.parser.add_argument('--dataset', default='jde', help='jde') self.parser.add_argument('--exp_id', default='default') self.parser.add_argument('--test', action='store_true') self.parser.add_argument('--load_model', default='../exp/mot/default/mcmot_last_track_resdcn_18_visdrone.pth', help='path to pretrained model') self.parser.add_argument('--resume', action='store_true', help='resume an experiment. ' 'Reloaded the optimizer parameter and ' 'set load_model to model_last.pth ' 'in the exp dir if load_model is empty.')

    # system
    self.parser.add_argument('--gpus',
                             default='6',  # 0, 5, 6
                             help='-1 for CPU, use comma for multiple gpus')
    self.parser.add_argument('--num_workers',
                             type=int,
                             default=4,  # 8, 6, 4
                             help='dataloader threads. 0 for single-thread.')
    self.parser.add_argument('--not_cuda_benchmark', action='store_true',
                             help='disable when the input size is not fixed.')
    self.parser.add_argument('--seed', type=int, default=317,
                             help='random seed')  # from CornerNet
    self.parser.add_argument('--gen-scale',
                             type=bool,
                             default=True,
                             help='Whether to generate multi-scales')
    self.parser.add_argument('--is_debug',
                             type=bool,
                             default=False,  # 是否使用多线程加载数据, default: False
                             help='whether in debug mode or not')  # debug模式下只能使用单进程

    # log
    self.parser.add_argument('--print_iter', type=int, default=0,
                             help='disable progress bar and print to screen.')
    self.parser.add_argument('--hide_data_time', action='store_true',
                             help='not display time during training.')
    self.parser.add_argument('--save_all', action='store_true',
                             help='save model to disk every 5 epochs.')
    self.parser.add_argument('--metric', default='loss',
                             help='main metric to save best model')
    self.parser.add_argument('--vis_thresh', type=float, default=0.5,
                             help='visualization threshold.')

    # model: backbone and so on...
    self.parser.add_argument('--arch',
                             default='resdcn_18',
                             help='model architecture. Currently tested'
                                  'resdcn_18 |resdcn_34 | resdcn_50 | resfpndcn_34 |'
                                  'dla_34 | hrnet_32 | hrnet_18 | cspdarknet_53')
    self.parser.add_argument('--head_conv',
                             type=int,
                             default=-1,
                             help='conv layer channels for output head'
                                  '0 for no conv layer'
                                  '-1 for default setting: '
                                  '256 for resnets and 256 for dla.')
    self.parser.add_argument('--down_ratio',
                             type=int,
                             default=4,  # 输出特征图的下采样率 H=H_image/4 and W=W_image/4
                             help='output stride. Currently only supports 4.')

    # input
    self.parser.add_argument('--input_res',
                             type=int,
                             default=-1,
                             help='input height and width. -1 for default from '
                                  'dataset. Will be overriden by input_h | input_w')
    self.parser.add_argument('--input_h',
                             type=int,
                             default=-1,
                             help='input height. -1 for default from dataset.')
    self.parser.add_argument('--input_w',
                             type=int,
                             default=-1,
                             help='input width. -1 for default from dataset.')

    # train
    self.parser.add_argument('--lr',
                             type=float,
                             default=7e-5,  # 1e-4, 7e-5, 5e-5, 3e-5
                             help='learning rate for batch size 32.')
    self.parser.add_argument('--lr_step',
                             type=str,
                             default='10,20',  # 20,27
                             help='drop learning rate by 10.')
    self.parser.add_argument('--num_epochs',
                             type=int,
                             default=30,  # 30, 10, 3, 1
                             help='total training epochs.')
    self.parser.add_argument('--batch-size',
                             type=int,
                             default=10,  # 18, 16, 14, 12, 10, 8, 4
                             help='batch size')
    self.parser.add_argument('--master_batch_size', type=int, default=-1,
                             help='batch size on the master gpu.')
    self.parser.add_argument('--num_iters', type=int, default=-1,
                             help='default: #samples / batch_size.')
    self.parser.add_argument('--val_intervals', type=int, default=10,
                             help='number of epochs to run validation.')
    self.parser.add_argument('--trainval',
                             action='store_true',
                             help='include validation in training and '
                                  'test on test set')

    # test
    self.parser.add_argument('--K',
                             type=int,
                             default=200,  # 128
                             help='max number of output objects.')  # 一张图输出检测目标最大数量
    self.parser.add_argument('--not_prefetch_test',
                             action='store_true',
                             help='not use parallal data pre-processing.')
    self.parser.add_argument('--fix_res',
                             action='store_true',
                             help='fix testing resolution or keep '
                                  'the original resolution')
    self.parser.add_argument('--keep_res',
                             action='store_true',
                             help='keep the original resolution'
                                  ' during validation.')
    # tracking
    self.parser.add_argument(
        '--test_mot16', default=False, help='test mot16')
    self.parser.add_argument(
        '--val_mot15', default=False, help='val mot15')
    self.parser.add_argument(
        '--test_mot15', default=False, help='test mot15')
    self.parser.add_argument(
        '--val_mot16', default=False, help='val mot16 or mot15')
    self.parser.add_argument(
        '--test_mot17', default=False, help='test mot17')
    self.parser.add_argument(
        '--val_mot17', default=False, help='val mot17')
    self.parser.add_argument(
        '--val_mot20', default=False, help='val mot20')
    self.parser.add_argument(
        '--test_mot20', default=False, help='test mot20')
    self.parser.add_argument(
        '--conf_thres',
        type=float,
        default=0.4,  # 0.6, 0.4
        help='confidence thresh for tracking')  # heat-map置信度阈值
    self.parser.add_argument('--det_thres',
                             type=float,
                             default=0.3,
                             help='confidence thresh for detection')
    self.parser.add_argument('--nms_thres',
                             type=float,
                             default=0.4,
                             help='iou thresh for nms')
    self.parser.add_argument('--track_buffer',
                             type=int,
                             default=30,  # 30
                             help='tracking buffer')
    self.parser.add_argument('--min-box-area',
                             type=float,
                             default=200,
                             help='filter out tiny boxes')

    # 测试阶段的输入数据模式: video or image dir
    self.parser.add_argument('--input-mode',
                             type=str,
                             default='video',  # video or image_dir or img_path_list_txt
                             help='input data type(video or image dir)')

    # 输入的video文件路径
    self.parser.add_argument('--input-video',
                             type=str,
                             default='../videos/uav0000086.mp4',
                             help='path to the input video')

    # 输入的image目录
    self.parser.add_argument('--input-img',
                             type=str,
                             default='/users/duanyou/c5/all_pretrain/test.txt',  # ../images/
                             help='path to the input image directory or image file list(.txt)')

    self.parser.add_argument('--output-format',
                             type=str,
                             default='video',
                             help='video or text')
    self.parser.add_argument('--output-root',
                             type=str,
                             default='../results',
                             help='expected output root path')

    # mot: 选择数据集的配置文件
    self.parser.add_argument('--data_cfg', type=str,
                             default='../src/lib/cfg/visdrone.json',  # 'mcmot_det.json', 'visdrone.json'
                             help='load data from cfg')
    # self.parser.add_argument('--data_cfg', type=str,
    #                          default='../src/lib/cfg/mcmot_det.json',  # mcmot.json, mcmot_det.json,
    #                          help='load data from cfg')
    self.parser.add_argument('--data_dir',
                             type=str,
                             default='/mnt/diskb/even/dataset')

    # loss
    self.parser.add_argument('--mse_loss',  # default: false
                             action='store_true',
                             help='use mse loss or focal loss to train '
                                  'keypoint heatmaps.')
    self.parser.add_argument('--reg_loss',
                             default='l1',
                             help='regression loss: sl1 | l1 | l2')  # sl1: smooth L1 loss
    self.parser.add_argument('--hm_weight',
                             type=float,
                             default=1,
                             help='loss weight for keypoint heatmaps.')
    self.parser.add_argument('--off_weight',
                             type=float,
                             default=1,
                             help='loss weight for keypoint local offsets.')
    self.parser.add_argument('--wh_weight',
                             type=float,
                             default=0.1,
                             help='loss weight for bounding box size.')
    self.parser.add_argument('--id_loss',
                             default='ce',
                             help='reid loss: ce | triplet')
    self.parser.add_argument('--id_weight',
                             type=float,
                             default=1,  # 0 for detection only and 1 for detection and re-id
                             help='loss weight for id')  # ReID feature extraction or not
    self.parser.add_argument('--reid_dim',
                             type=int,
                             default=128,  # 128, 256, 512
                             help='feature dim for reid')
    self.parser.add_argument('--input-wh',
                             type=tuple,
                             default=(1088, 608),  # (768, 448) or (1088, 608)
                             help='net input resplution')
    self.parser.add_argument('--multi-scale',
                             type=bool,
                             default=True,
                             help='Whether to use multi-scale training or not')

    # ----------------------1~10 object classes are what we need
    # pedestrian      (1),  --> 0
    # people          (2),  --> 1
    # bicycle         (3),  --> 2
    # car             (4),  --> 3
    # van             (5),  --> 4
    # truck           (6),  --> 5
    # tricycle        (7),  --> 6
    # awning-tricycle (8),  --> 7
    # bus             (9),  --> 8
    # motor           (10), --> 9
    # ----------------------

    # others          (11)
    self.parser.add_argument('--reid_cls_ids',
                             default='0,1,2,3,4,5,6,7,8,9',  # '0,1,2,3,4' or '0,1,2,3,4,5,6,7,8,9'
                             help='')  # the object classes need to do reid

    self.parser.add_argument('--norm_wh', action='store_true',
                             help='L1(\hat(y) / y, 1) or L1(\hat(y), y)')
    self.parser.add_argument('--dense_wh', action='store_true',
                             help='apply weighted regression near center or '
                                  'just apply regression on center point.')
    self.parser.add_argument('--cat_spec_wh',
                             action='store_true',
                             help='category specific bounding box size.')
    self.parser.add_argument('--not_reg_offset',
                             action='store_true',
                             help='not regress local offset.')
CaptainEven commented 3 years ago

我的log: ssh://jaya@192.168.1.211:22/usr/bin/python3 -u /mnt/diskb/even/MCMOT/src/demo.py Fix size testing. training chunk_sizes: [10] The output will be saved to /mnt/diskb/even/MCMOT/src/lib/../../exp/mot/default Net input image size: 1088×608 heads: {'hm': 10, 'wh': 2, 'id': 128, 'reg': 2} 2020-10-21 12:06:29 [INFO]: Starting tracking... 2020-10-21 12:06:29 [INFO]: Starting tracking... Lenth of the video: 464 frames Creating model... loaded ../exp/mot/default/mcmot_last_track_resdcn_18_visdrone.pth, epoch 29 2020-10-21 12:06:33 [INFO]: Processing frame 0 (100000.00 fps) 2020-10-21 12:06:33 [INFO]: Processing frame 0 (100000.00 fps) /pytorch/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. 2020-10-21 12:06:35 [INFO]: Processing frame 20 (24.34 fps) 2020-10-21 12:06:35 [INFO]: Processing frame 20 (24.34 fps) 2020-10-21 12:06:37 [INFO]: Processing frame 40 (24.49 fps) 2020-10-21 12:06:37 [INFO]: Processing frame 40 (24.49 fps) 2020-10-21 12:06:39 [INFO]: Processing frame 60 (24.52 fps) 2020-10-21 12:06:39 [INFO]: Processing frame 60 (24.52 fps) 2020-10-21 12:06:41 [INFO]: Processing frame 80 (24.55 fps) 2020-10-21 12:06:41 [INFO]: Processing frame 80 (24.55 fps) 2020-10-21 12:06:43 [INFO]: Processing frame 100 (24.51 fps) 2020-10-21 12:06:43 [INFO]: Processing frame 100 (24.51 fps) 2020-10-21 12:06:44 [INFO]: Processing frame 120 (24.50 fps) 2020-10-21 12:06:44 [INFO]: Processing frame 120 (24.50 fps) 2020-10-21 12:06:46 [INFO]: Processing frame 140 (24.53 fps) 2020-10-21 12:06:46 [INFO]: Processing frame 140 (24.53 fps) 2020-10-21 12:06:48 [INFO]: Processing frame 160 (24.56 fps) 2020-10-21 12:06:48 [INFO]: Processing frame 160 (24.56 fps) 2020-10-21 12:06:50 [INFO]: Processing frame 180 (24.54 fps) 2020-10-21 12:06:50 [INFO]: Processing frame 180 (24.54 fps) 2020-10-21 12:06:52 [INFO]: Processing frame 200 (24.55 fps) 2020-10-21 12:06:52 [INFO]: Processing frame 200 (24.55 fps) 2020-10-21 12:06:54 [INFO]: Processing frame 220 (24.53 fps) 2020-10-21 12:06:54 [INFO]: Processing frame 220 (24.53 fps) 2020-10-21 12:06:56 [INFO]: Processing frame 240 (24.52 fps) 2020-10-21 12:06:56 [INFO]: Processing frame 240 (24.52 fps) 2020-10-21 12:06:58 [INFO]: Processing frame 260 (24.54 fps) 2020-10-21 12:06:58 [INFO]: Processing frame 260 (24.54 fps) 2020-10-21 12:07:00 [INFO]: Processing frame 280 (24.51 fps) 2020-10-21 12:07:00 [INFO]: Processing frame 280 (24.51 fps) 2020-10-21 12:07:02 [INFO]: Processing frame 300 (24.51 fps) 2020-10-21 12:07:02 [INFO]: Processing frame 300 (24.51 fps) 2020-10-21 12:07:04 [INFO]: Processing frame 320 (24.50 fps) 2020-10-21 12:07:04 [INFO]: Processing frame 320 (24.50 fps) 2020-10-21 12:07:06 [INFO]: Processing frame 340 (24.47 fps) 2020-10-21 12:07:06 [INFO]: Processing frame 340 (24.47 fps) 2020-10-21 12:07:08 [INFO]: Processing frame 360 (24.47 fps) 2020-10-21 12:07:08 [INFO]: Processing frame 360 (24.47 fps) 2020-10-21 12:07:10 [INFO]: Processing frame 380 (24.47 fps) 2020-10-21 12:07:10 [INFO]: Processing frame 380 (24.47 fps) 2020-10-21 12:07:12 [INFO]: Processing frame 400 (24.48 fps) 2020-10-21 12:07:12 [INFO]: Processing frame 400 (24.48 fps) 2020-10-21 12:07:14 [INFO]: Processing frame 420 (24.46 fps) 2020-10-21 12:07:14 [INFO]: Processing frame 420 (24.46 fps) 2020-10-21 12:07:16 [INFO]: Processing frame 440 (24.46 fps) 2020-10-21 12:07:16 [INFO]: Processing frame 440 (24.46 fps) 2020-10-21 12:07:18 [INFO]: Processing frame 460 (24.45 fps) 2020-10-21 12:07:18 [INFO]: Processing frame 460 (24.45 fps) 2020-10-21 12:07:18 [INFO]: save results to ../results/results.txt 2020-10-21 12:07:18 [INFO]: save results to ../results/results.txt ffmpeg version 3.4.6-0ubuntu0.18.04.1 Copyright (c) 2000-2019 the FFmpeg developers built with gcc 7 (Ubuntu 7.3.0-16ubuntu3) configuration: --prefix=/usr --extra-version=0ubuntu0.18.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared libavutil 55. 78.100 / 55. 78.100 libavcodec 57.107.100 / 57.107.100 libavformat 57. 83.100 / 57. 83.100 libavdevice 57. 10.100 / 57. 10.100 libavfilter 6.107.100 / 6.107.100 libavresample 3. 7. 0 / 3. 7. 0 libswscale 4. 8.100 / 4. 8.100 libswresample 2. 9.100 / 2. 9.100 libpostproc 54. 7.100 / 54. 7.100 Input #0, image2, from '../results/frame/%05d.jpg': Duration: 00:00:18.52, start: 0.000000, bitrate: N/A Stream #0:0: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 1920x1080 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 25 tbn, 25 tbc Please use -b:a or -b:v, -b is ambiguous Stream mapping: Stream #0:0 -> #0:0 (mjpeg (native) -> mpeg4 (native)) Press [q] to stop, [?] for help [swscaler @ 0x557e6a1062c0] deprecated pixel format used, make sure you did set range correctly Output #0, mp4, to '../results/uav0000086_track.mp4': Metadata: encoder : Lavf57.83.100 Stream #0:0: Video: mpeg4 (mp4v / 0x7634706D), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 5000 kb/s, 25 fps, 12800 tbn, 25 tbc Metadata: encoder : Lavc57.107.100 mpeg4 Side data: cpb: bitrate max/min/avg: 0/0/5000000 buffer size: 0 vbv_delay: -1 frame= 463 fps= 52 q=13.4 Lsize= 11450kB time=00:00:18.48 bitrate=5075.6kbits/s speed=2.07x
video:11447kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.026089%

Process finished with exit code 0

CaptainEven commented 3 years ago

注意,在multitracker.py中 image

CaptainEven commented 3 years ago

Wish you good luck!

syx-THU commented 3 years ago

非常感谢!我再试试。

Ronales commented 2 years ago

Wish you good luck!

Thanks for your great repo!, some tips :

  1. corresponding 10 class tracking: from gen_dataset_visdrone import cls2id, id2cls # visdrone

  2. corresponding 5 class tracking:
    from gen_labels_detrac_mcmot import cls2id, id2cls # mcmot_c5