facebookresearch / SlowFast

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Apache License 2.0
6.61k stars 1.21k forks source link

training loss is always 0.0 #327

Closed wwdok closed 3 years ago

wwdok commented 3 years ago

I print out stats in the def log_epoch_stats(self, cur_epoch):, they are shown as below:

(base) weidawang@weidawang-TUF-Gaming-FX506LU-FX506LU:~/Repo/slowfast$ python tools/run_net.py --cfg configs/Kinetics/C2D_8x8_R50.yaml NUM_GPUS 1 TRAIN.BATCH_SIZE 4 SOLVER.BASE_LR 0.0125 DATA.PATH_TO_DATA_DIR /media/weidawang/DATA/dataset/HMDB51/hmdb51_org/fall_floor DATA.PATH_LABEL_SEPARATOR ","
[11/16 14:11:46][INFO] train_net.py: 374: Train with config:
[11/16 14:11:46][INFO] train_net.py: 375: {'AVA': {'ANNOTATION_DIR': '/mnt/vol/gfsai-flash3-east/ai-group/users/haoqifan/ava/frame_list/',
         'BGR': False,
         'DETECTION_SCORE_THRESH': 0.9,
         'EXCLUSION_FILE': 'ava_val_excluded_timestamps_v2.2.csv',
         'FRAME_DIR': '/mnt/fair-flash3-east/ava_trainval_frames.img/',
         'FRAME_LIST_DIR': '/mnt/vol/gfsai-flash3-east/ai-group/users/haoqifan/ava/frame_list/',
         'FULL_TEST_ON_VAL': False,
         'GROUNDTRUTH_FILE': 'ava_val_v2.2.csv',
         'IMG_PROC_BACKEND': 'cv2',
         'LABEL_MAP_FILE': 'ava_action_list_v2.2_for_activitynet_2019.pbtxt',
         'TEST_FORCE_FLIP': False,
         'TEST_LISTS': ['val.csv'],
         'TEST_PREDICT_BOX_LISTS': ['ava_val_predicted_boxes.csv'],
         'TRAIN_GT_BOX_LISTS': ['ava_train_v2.2.csv'],
         'TRAIN_LISTS': ['train.csv'],
         'TRAIN_PCA_EIGVAL': [0.225, 0.224, 0.229],
         'TRAIN_PCA_EIGVEC': [[-0.5675, 0.7192, 0.4009],
                              [-0.5808, -0.0045, -0.814],
                              [-0.5836, -0.6948, 0.4203]],
         'TRAIN_PCA_JITTER_ONLY': True,
         'TRAIN_PREDICT_BOX_LISTS': [],
         'TRAIN_USE_COLOR_AUGMENTATION': False},
 'BENCHMARK': CfgNode({'NUM_EPOCHS': 5, 'LOG_PERIOD': 100, 'SHUFFLE': True}),
 'BN': {'NORM_TYPE': 'batchnorm',
        'NUM_BATCHES_PRECISE': 200,
        'NUM_SPLITS': 1,
        'NUM_SYNC_DEVICES': 1,
        'USE_PRECISE_STATS': True,
        'WEIGHT_DECAY': 0.0},
 'DATA': {'DECODING_BACKEND': 'pyav',
          'ENSEMBLE_METHOD': 'sum',
          'INPUT_CHANNEL_NUM': [3],
          'INV_UNIFORM_SAMPLE': False,
          'MEAN': [0.45, 0.45, 0.45],
          'MULTI_LABEL': False,
          'NUM_FRAMES': 8,
          'PATH_LABEL_SEPARATOR': ',',
          'PATH_PREFIX': '',
          'PATH_TO_DATA_DIR': '/media/weidawang/DATA/dataset/HMDB51/hmdb51_org/fall_floor',
          'RANDOM_FLIP': True,
          'REVERSE_INPUT_CHANNEL': False,
          'SAMPLING_RATE': 8,
          'STD': [0.225, 0.225, 0.225],
          'TARGET_FPS': 30,
          'TEST_CROP_SIZE': 256,
          'TRAIN_CROP_SIZE': 224,
          'TRAIN_JITTER_SCALES': [256, 320]},
 'DATA_LOADER': {'ENABLE_MULTI_THREAD_DECODE': False,
                 'NUM_WORKERS': 8,
                 'PIN_MEMORY': True},
 'DEMO': {'BUFFER_SIZE': 0,
          'CLIP_VIS_SIZE': 10,
          'COMMON_CLASS_NAMES': ['watch (a person)',
                                 'talk to (e.g., self, a person, a group)',
                                 'listen to (a person)',
                                 'touch (an object)',
                                 'carry/hold (an object)',
                                 'walk',
                                 'sit',
                                 'lie/sleep',
                                 'bend/bow (at the waist)'],
          'COMMON_CLASS_THRES': 0.7,
          'DETECTRON2_CFG': 'COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml',
          'DETECTRON2_THRESH': 0.9,
          'DETECTRON2_WEIGHTS': 'detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl',
          'DISPLAY_HEIGHT': 0,
          'DISPLAY_WIDTH': 0,
          'ENABLE': False,
          'FPS': 30,
          'GT_BOXES': '',
          'INPUT_FORMAT': 'BGR',
          'INPUT_VIDEO': '',
          'LABEL_FILE_PATH': '',
          'NUM_CLIPS_SKIP': 0,
          'NUM_VIS_INSTANCES': 2,
          'OUTPUT_FILE': '',
          'OUTPUT_FPS': -1,
          'PREDS_BOXES': '',
          'SLOWMO': 1,
          'STARTING_SECOND': 900,
          'THREAD_ENABLE': False,
          'UNCOMMON_CLASS_THRES': 0.3,
          'VIS_MODE': 'thres',
          'WEBCAM': -1},
 'DETECTION': {'ALIGNED': True,
               'ENABLE': False,
               'ROI_XFORM_RESOLUTION': 7,
               'SPATIAL_SCALE_FACTOR': 16},
 'DIST_BACKEND': 'nccl',
 'LOG_MODEL_INFO': True,
 'LOG_PERIOD': 10,
 'MODEL': {'ARCH': 'c2d',
           'DROPCONNECT_RATE': 0.0,
           'DROPOUT_RATE': 0.5,
           'FC_INIT_STD': 0.01,
           'HEAD_ACT': 'softmax',
           'LOSS_FUNC': 'cross_entropy',
           'MODEL_NAME': 'ResNet',
           'MULTI_PATHWAY_ARCH': ['slowfast'],
           'NUM_CLASSES': 1,
           'SINGLE_PATHWAY_ARCH': ['c2d', 'i3d', 'slow', 'x3d']},
 'MULTIGRID': {'BN_BASE_SIZE': 8,
               'DEFAULT_B': 0,
               'DEFAULT_S': 0,
               'DEFAULT_T': 0,
               'EPOCH_FACTOR': 1.5,
               'EVAL_FREQ': 3,
               'LONG_CYCLE': False,
               'LONG_CYCLE_FACTORS': [(0.25, 0.7071067811865476),
                                      (0.5, 0.7071067811865476),
                                      (0.5, 1),
                                      (1, 1)],
               'LONG_CYCLE_SAMPLING_RATE': 0,
               'SHORT_CYCLE': False,
               'SHORT_CYCLE_FACTORS': [0.5, 0.7071067811865476]},
 'NONLOCAL': {'GROUP': [[1], [1], [1], [1]],
              'INSTANTIATION': 'softmax',
              'LOCATION': [[[]], [[]], [[]], [[]]],
              'POOL': [[[1, 2, 2], [1, 2, 2]],
                       [[1, 2, 2], [1, 2, 2]],
                       [[1, 2, 2], [1, 2, 2]],
                       [[1, 2, 2], [1, 2, 2]]]},
 'NUM_GPUS': 1,
 'NUM_SHARDS': 1,
 'OUTPUT_DIR': '.',
 'RESNET': {'DEPTH': 50,
            'INPLACE_RELU': True,
            'NUM_BLOCK_TEMP_KERNEL': [[3], [4], [6], [3]],
            'NUM_GROUPS': 1,
            'SPATIAL_DILATIONS': [[1], [1], [1], [1]],
            'SPATIAL_STRIDES': [[1], [2], [2], [2]],
            'STRIDE_1X1': False,
            'TRANS_FUNC': 'bottleneck_transform',
            'WIDTH_PER_GROUP': 64,
            'ZERO_INIT_FINAL_BN': True},
 'RNG_SEED': 0,
 'SHARD_ID': 0,
 'SLOWFAST': {'ALPHA': 8,
              'BETA_INV': 8,
              'FUSION_CONV_CHANNEL_RATIO': 2,
              'FUSION_KERNEL_SZ': 5},
 'SOLVER': {'BASE_LR': 0.0125,
            'BASE_LR_SCALE_NUM_SHARDS': False,
            'COSINE_END_LR': 0.0,
            'DAMPENING': 0.0,
            'GAMMA': 0.1,
            'LRS': [],
            'LR_POLICY': 'cosine',
            'MAX_EPOCH': 196,
            'MOMENTUM': 0.9,
            'NESTEROV': True,
            'OPTIMIZING_METHOD': 'sgd',
            'STEPS': [],
            'STEP_SIZE': 1,
            'WARMUP_EPOCHS': 34.0,
            'WARMUP_FACTOR': 0.1,
            'WARMUP_START_LR': 0.01,
            'WEIGHT_DECAY': 0.0001},
 'TENSORBOARD': {'CATEGORIES_PATH': '',
                 'CLASS_NAMES_PATH': '',
                 'CONFUSION_MATRIX': {'ENABLE': False,
                                      'FIGSIZE': [8, 8],
                                      'SUBSET_PATH': ''},
                 'ENABLE': False,
                 'HISTOGRAM': {'ENABLE': False,
                               'FIGSIZE': [8, 8],
                               'SUBSET_PATH': '',
                               'TOPK': 10},
                 'LOG_DIR': '',
                 'MODEL_VIS': {'ACTIVATIONS': False,
                               'COLORMAP': 'Pastel2',
                               'ENABLE': False,
                               'GRAD_CAM': {'COLORMAP': 'viridis',
                                            'ENABLE': True,
                                            'LAYER_LIST': [],
                                            'USE_TRUE_LABEL': False},
                               'INPUT_VIDEO': False,
                               'LAYER_LIST': [],
                               'MODEL_WEIGHTS': False,
                               'TOPK_PREDS': 1},
                 'PREDICTIONS_PATH': '',
                 'WRONG_PRED_VIS': {'ENABLE': False,
                                    'SUBSET_PATH': '',
                                    'TAG': 'Incorrectly classified videos.'}},
 'TEST': {'BATCH_SIZE': 64,
          'CHECKPOINT_FILE_PATH': '',
          'CHECKPOINT_TYPE': 'pytorch',
          'DATASET': 'mydata',
          'ENABLE': True,
          'NUM_ENSEMBLE_VIEWS': 10,
          'NUM_SPATIAL_CROPS': 3,
          'SAVE_RESULTS_PATH': ''},
 'TRAIN': {'AUTO_RESUME': True,
           'BATCH_SIZE': 4,
           'CHECKPOINT_CLEAR_NAME_PATTERN': (),
           'CHECKPOINT_EPOCH_RESET': False,
           'CHECKPOINT_FILE_PATH': '',
           'CHECKPOINT_INFLATE': False,
           'CHECKPOINT_PERIOD': 1,
           'CHECKPOINT_TYPE': 'pytorch',
           'DATASET': 'mydata',
           'ENABLE': True,
           'EVAL_PERIOD': 10},
 'X3D': {'BN_LIN5': False,
         'BOTTLENECK_FACTOR': 1.0,
         'CHANNELWISE_3x3x3': True,
         'DEPTH_FACTOR': 1.0,
         'DIM_C1': 12,
         'DIM_C5': 2048,
         'SCALE_RES2': False,
         'WIDTH_FACTOR': 1.0}}
[11/16 14:11:48][INFO] misc.py: 169: Model:

...

[11/16 14:11:49][INFO] mydata.py: 115: Constructing Mydata dataloader (size: 27) from /media/weidawang/DATA/dataset/HMDB51/hmdb51_org/fall_floor/train.csv
[11/16 14:11:49][INFO] train_net.py: 414: Start epoch: 21
{'_type': 'train_epoch', 'epoch': '21/196', 'dt': 0.035688046000359464, 'dt_data': 0.035687760000655544, 'dt_net': 0.3178404740010592, 'eta': '0:00:37', 'lr': 0.01097710923332317, 'gpu_mem': '2.35G', 'RAM': '9.15/15.48G', 'top1_err': 0.0, 'loss': 0.0}
[11/16 14:11:51][INFO] logging.py:  96: json_stats: {"RAM": "9.15/15.48G", "_type": "train_epoch", "dt": 0.03569, "dt_data": 0.03569, "dt_net": 0.31784, "epoch": "21/196", "eta": "0:00:37", "gpu_mem": "2.35G", "loss": 0.00000, "lr": 0.01098, "top1_err": 0.00000}
{'_type': 'train_epoch', 'epoch': '22/196', 'dt': 0.03450735799924587, 'dt_data': 0.034507232001487864, 'dt_net': 0.3217590990007011, 'eta': '0:00:36', 'lr': 0.011024010476522683, 'gpu_mem': '2.35G', 'RAM': '9.15/15.48G', 'top1_err': 0.0, 'loss': 0.0}
[11/16 14:11:56][INFO] logging.py:  96: json_stats: {"RAM": "9.15/15.48G", "_type": "train_epoch", "dt": 0.03451, "dt_data": 0.03451, "dt_net": 0.32176, "epoch": "22/196", "eta": "0:00:36", "gpu_mem": "2.35G", "loss": 0.00000, "lr": 0.01102, "top1_err": 0.00000}
{'_type': 'train_epoch', 'epoch': '23/196', 'dt': 0.04107699700034573, 'dt_data': 0.041077166999457404, 'dt_net': 0.3185689900001307, 'eta': '0:00:42', 'lr': 0.011070911719722194, 'gpu_mem': '2.35G', 'RAM': '9.16/15.48G', 'top1_err': 0.0, 'loss': 0.0}
[11/16 14:12:00][INFO] logging.py:  96: json_stats: {"RAM": "9.16/15.48G", "_type": "train_epoch", "dt": 0.04108, "dt_data": 0.04108, "dt_net": 0.31857, "epoch": "23/196", "eta": "0:00:42", "gpu_mem": "2.35G", "loss": 0.00000, "lr": 0.01107, "top1_err": 0.00000}
{'_type': 'train_epoch', 'epoch': '24/196', 'dt': 0.03771631099880324, 'dt_data': 0.037716199998612865, 'dt_net': 0.32401606399980665, 'eta': '0:00:38', 'lr': 0.011117812962921707, 'gpu_mem': '2.35G', 'RAM': '9.16/15.48G', 'top1_err': 0.0, 'loss': 0.0}

...

[11/16 14:26:11][INFO] logging.py:  96: json_stats: {"RAM": "8.08/15.48G", "_type": "train_epoch", "dt": 0.04218, "dt_data": 0.04218, "dt_net": 0.33933, "epoch": "195/196", "eta": "0:00:00", "gpu_mem": "2.35G", "loss": 0.00000, "lr": 0.00000, "top1_err": 0.00000}
{'_type': 'train_epoch', 'epoch': '196/196', 'dt': 0.04059268899982271, 'dt_data': 0.0405925009999919, 'dt_net': 0.333484340000723, 'eta': '0:00:00', 'lr': 2.230154059895684e-08, 'gpu_mem': '2.35G', 'RAM': '8.08/15.48G', 'top1_err': 0.0, 'loss': 0.0}
[11/16 14:26:16][INFO] logging.py:  96: json_stats: {"RAM": "8.08/15.48G", "_type": "train_epoch", "dt": 0.04059, "dt_data": 0.04059, "dt_net": 0.33348, "epoch": "196/196", "eta": "0:00:00", "gpu_mem": "2.35G", "loss": 0.00000, "lr": 0.00000, "top1_err": 0.00000}
[11/16 14:26:18][INFO] logging.py:  96: json_stats: {"RAM": "8.09/15.48G", "_type": "val_epoch", "epoch": "196/196", "gpu_mem": "2.35G", "min_top1_err": 0.00000, "time_diff": 0.03788, "top1_err": 0.00000}
[11/16 14:26:18][INFO] test_net.py: 156: Test with config:
[11/16 14:26:18][INFO] test_net.py: 157: AVA:
...
wwdok commented 3 years ago

My case:

I use 4 print function

                print("==============!!!!!!!!!!!!!!!!!!!!!!~~~~~~~~~~~~~~~~~")
                print("preds is {}".format(preds.tolist()))
                print("labels is {}".format(labels.tolist()))
                num_topks_correct = metrics.topks_correct(preds, labels, (1))
                print("preds.size(0) is {}".format(preds.size(0)))
                print("num_topks_correct is {}".format(num_topks_correct))
                top1_err= [(1.0 - x / preds.size(0)) * 100.0 for x in num_topks_correct][0]

in def train_epoch() to check preds and labels, part of its output is :

[11/17 15:35:50][INFO] train_net.py: 419: Start epoch: 2
==============!!!!!!!!!!!!!!!!!!!!!!~~~~~~~~~~~~~~~~~
preds is [[-0.03573627024888992], [-0.32597339153289795]]
labels is [0, 0]
preds.size(0) is 2
num_topks_correct is [tensor(2., device='cuda:0')]
==============!!!!!!!!!!!!!!!!!!!!!!~~~~~~~~~~~~~~~~~
preds is [[-0.30879950523376465], [-0.02247714065015316]]
labels is [0, 0]
preds.size(0) is 2
num_topks_correct is [tensor(2., device='cuda:0')]
==============!!!!!!!!!!!!!!!!!!!!!!~~~~~~~~~~~~~~~~~
preds is [[0.05403393507003784], [-0.18906450271606445]]
labels is [0, 0]
preds.size(0) is 2
num_topks_correct is [tensor(2., device='cuda:0')]
==============!!!!!!!!!!!!!!!!!!!!!!~~~~~~~~~~~~~~~~~
preds is [[-0.18617770075798035], [-0.16703137755393982]]
labels is [0, 0]
preds.size(0) is 2
num_topks_correct is [tensor(2., device='cuda:0')]
==============!!!!!!!!!!!!!!!!!!!!!!~~~~~~~~~~~~~~~~~
preds is [[0.10825307667255402], [-0.292312890291214]]
labels is [0, 0]
preds.size(0) is 2
num_topks_correct is [tensor(2., device='cuda:0')]
==============!!!!!!!!!!!!!!!!!!!!!!~~~~~~~~~~~~~~~~~
preds is [[-0.0778438001871109], [-0.07582096755504608]]
labels is [0, 0]
preds.size(0) is 2
num_topks_correct is [tensor(2., device='cuda:0')]
==============!!!!!!!!!!!!!!!!!!!!!!~~~~~~~~~~~~~~~~~
preds is [[-0.30321329832077026], [-0.11427342891693115]]
labels is [0, 0]
preds.size(0) is 2
num_topks_correct is [tensor(2., device='cuda:0')]
==============!!!!!!!!!!!!!!!!!!!!!!~~~~~~~~~~~~~~~~~
preds is [[-0.09245844930410385], [-0.25378167629241943]]
labels is [0, 0]
preds.size(0) is 2
num_topks_correct is [tensor(2., device='cuda:0')]
==============!!!!!!!!!!!!!!!!!!!!!!~~~~~~~~~~~~~~~~~
preds is [[-0.2726392149925232], [0.0589011125266552]]
labels is [0, 0]
preds.size(0) is 2
num_topks_correct is [tensor(2., device='cuda:0')]
==============!!!!!!!!!!!!!!!!!!!!!!~~~~~~~~~~~~~~~~~
preds is [[-0.07824113965034485], [-0.22474029660224915]]
labels is [0, 0]
preds.size(0) is 2
num_topks_correct is [tensor(2., device='cuda:0')]
[11/17 15:35:59][INFO] logging.py:  96: json_stats: {"_type": "train_iter", "dt": 0.75530, "dt_data": 0.00388, "dt_net": 0.75142, "epoch": "2/10", "eta": "0:01:20", "gpu_mem": "2.78G", "iter": "10/13", "loss": 0.00000, "lr": 0.00972, "top1_err": 0.00000}

the predition is negtive, is it normal ? i didn't enable DETECTION in yaml.

my yaml is :

TRAIN:
  ENABLE: True
  DATASET: kinetics
  BATCH_SIZE: 2  
  EVAL_PERIOD: 10  
  CHECKPOINT_FILE_PATH: "./demo/Kinetics/SLOWFAST_8x8_R50.pkl"  
  CHECKPOINT_TYPE: caffe2
  CHECKPOINT_PERIOD: 1
  AUTO_RESUME: True
DATA:
  NUM_FRAMES: 32
  SAMPLING_RATE: 2
  TRAIN_JITTER_SCALES: [256, 320]
  TRAIN_CROP_SIZE: 224
  TEST_CROP_SIZE: 256
  INPUT_CHANNEL_NUM: [3, 3]
  PATH_TO_DATA_DIR: "/media/weidawang/DATA/dataset/HMDB51/hmdb51_org/fall_floor"
  PATH_LABEL_SEPARATOR: ","
SLOWFAST:
  ALPHA: 4
  BETA_INV: 8
  FUSION_CONV_CHANNEL_RATIO: 2
  FUSION_KERNEL_SZ: 7
RESNET:
  ZERO_INIT_FINAL_BN: True
  WIDTH_PER_GROUP: 64
  NUM_GROUPS: 1
  DEPTH: 50
  TRANS_FUNC: bottleneck_transform
  STRIDE_1X1: False
  NUM_BLOCK_TEMP_KERNEL: [[3, 3], [4, 4], [6, 6], [3, 3]]
  SPATIAL_STRIDES: [[1, 1], [2, 2], [2, 2], [2, 2]]
  SPATIAL_DILATIONS: [[1, 1], [1, 1], [1, 1], [1, 1]]
NONLOCAL:
  LOCATION: [[[], []], [[], []], [[], []], [[], []]]
  GROUP: [[1, 1], [1, 1], [1, 1], [1, 1]]
  INSTANTIATION: dot_product
BN:
  USE_PRECISE_STATS: True
  NUM_BATCHES_PRECISE: 200
SOLVER:
  BASE_LR: 0.0125  
  LR_POLICY: cosine
  MAX_EPOCH: 10  
  MOMENTUM: 0.9
  WEIGHT_DECAY: 1e-4
  WARMUP_EPOCHS: 34.0
  WARMUP_START_LR: 0.01
  OPTIMIZING_METHOD: sgd
MODEL:
  NUM_CLASSES: 1 
  ARCH: slowfast
  MODEL_NAME: SlowFast
  LOSS_FUNC: cross_entropy
  DROPOUT_RATE: 0.5
TEST:
  ENABLE: True
  DATASET: kinetics
  BATCH_SIZE: 2  
DATA_LOADER:
  NUM_WORKERS: 8
  PIN_MEMORY: True
NUM_GPUS: 1  
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: .
wwdok commented 3 years ago

Just as AlexanderMelde said, i need to increase training action classes number more than one.