fredzzhang / pvic

[ICCV'23] Official PyTorch implementation for paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"
BSD 3-Clause "New" or "Revised" License
64 stars 8 forks source link

Random seeds in training #40

Closed hutuo1213 closed 10 months ago

hutuo1213 commented 10 months ago

Hi, We found that the random seed in the PVIC code fixes the first test result, but subsequent training results produce variations. Here is what happens when the same code is run twice.

Namespace(alpha=0.5, aux_loss=True, backbone='resnet101', batch_size=16, bbox_loss_coef=5, box_score_thresh=0.05, cache=False, clip_max_norm=0.1, data_root='./hicodet', dataset='hicodet', dec_layers=6, detector='base', device='cuda', dilation=False, dim_feedforward=2048, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=30, eval=False, gamma=0.1, giou_loss_coef=2, hidden_dim=256, kv_src='C5', lr_drop=20, lr_drop_factor=0.2, lr_head=0.0001, max_instances=15, min_instances=3, nheads=8, num_queries=100, num_workers=2, output_dir='outputs/dub', partitions=['train2015', 'test2015'], port='1234', position_embedding='sine', pre_norm=False, pretrained='checkpoints/detr-r101-hicodet.pth', print_interval=100, raw_lambda=2.8, repr_dim=384, resume='', sanity=False, seed=140, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, triplet_dec_layers=2, triplet_enc_layers=1, use_wandb=False, weight_decay=0.0001, world_size=2)
Rank 1: Load weights for the object detector from checkpoints/detr-r101-hicodet.pth
=> Rank 1: PViC randomly initialised.
Rank 0: Load weights for the object detector from checkpoints/detr-r101-hicodet.pth
=> Rank 0: PViC randomly initialised.
Epoch 0 =>  mAP: 0.1483, rare: 0.1011, none-rare: 0.1624.
Epoch [1/30], Iter. [0100/2352], Loss: 3.8620, Time[Data/Iter.]: [6.48s/200.77s]
Epoch [1/30], Iter. [0200/2352], Loss: 2.2175, Time[Data/Iter.]: [0.12s/198.64s]
Epoch [1/30], Iter. [0300/2352], Loss: 2.1058, Time[Data/Iter.]: [0.12s/197.65s]
Epoch [1/30], Iter. [0400/2352], Loss: 1.9482, Time[Data/Iter.]: [0.12s/200.41s]
Epoch [1/30], Iter. [0500/2352], Loss: 1.8276, Time[Data/Iter.]: [0.12s/199.10s]
Epoch [1/30], Iter. [0600/2352], Loss: 1.7830, Time[Data/Iter.]: [0.12s/195.86s]
Epoch [1/30], Iter. [0700/2352], Loss: 1.7758, Time[Data/Iter.]: [0.12s/193.67s]
Epoch [1/30], Iter. [0800/2352], Loss: 1.7299, Time[Data/Iter.]: [0.12s/197.73s]
Epoch [1/30], Iter. [0900/2352], Loss: 1.6942, Time[Data/Iter.]: [0.12s/199.21s]
Epoch [1/30], Iter. [1000/2352], Loss: 1.6837, Time[Data/Iter.]: [0.12s/195.18s]
Epoch [1/30], Iter. [1100/2352], Loss: 1.6410, Time[Data/Iter.]: [0.12s/198.79s]
Epoch [1/30], Iter. [1200/2352], Loss: 1.6846, Time[Data/Iter.]: [0.12s/193.91s]
Epoch [1/30], Iter. [1300/2352], Loss: 1.6586, Time[Data/Iter.]: [0.12s/197.35s]
Epoch [1/30], Iter. [1400/2352], Loss: 1.6119, Time[Data/Iter.]: [0.12s/199.18s]
Epoch [1/30], Iter. [1500/2352], Loss: 1.6100, Time[Data/Iter.]: [0.12s/195.75s]
Epoch [1/30], Iter. [1600/2352], Loss: 1.6113, Time[Data/Iter.]: [0.12s/197.37s]
Epoch [1/30], Iter. [1700/2352], Loss: 1.5859, Time[Data/Iter.]: [0.12s/194.13s]
Epoch [1/30], Iter. [1800/2352], Loss: 1.6008, Time[Data/Iter.]: [0.12s/198.77s]
Epoch [1/30], Iter. [1900/2352], Loss: 1.5268, Time[Data/Iter.]: [0.12s/195.72s]
Epoch [1/30], Iter. [2000/2352], Loss: 1.5740, Time[Data/Iter.]: [0.12s/196.66s]
Epoch [1/30], Iter. [2100/2352], Loss: 1.5322, Time[Data/Iter.]: [0.12s/199.11s]
Epoch [1/30], Iter. [2200/2352], Loss: 1.5152, Time[Data/Iter.]: [0.12s/199.33s]
Epoch [1/30], Iter. [2300/2352], Loss: 1.5533, Time[Data/Iter.]: [0.12s/196.02s]
Epoch 1 =>  mAP: 0.3168, rare: 0.3048, none-rare: 0.3204.
Namespace(alpha=0.5, aux_loss=True, backbone='resnet101', batch_size=16, bbox_loss_coef=5, box_score_thresh=0.05, cache=False, clip_max_norm=0.1, data_root='./hicodet', dataset='hicodet', dec_layers=6, detector='base', device='cuda', dilation=False, dim_feedforward=2048, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=30, eval=False, gamma=0.1, giou_loss_coef=2, hidden_dim=256, kv_src='C5', lr_drop=20, lr_drop_factor=0.2, lr_head=0.0001, max_instances=15, min_instances=3, nheads=8, num_queries=100, num_workers=2, output_dir='outputs/dub', partitions=['train2015', 'test2015'], port='1234', position_embedding='sine', pre_norm=False, pretrained='checkpoints/detr-r101-hicodet.pth', print_interval=100, raw_lambda=2.8, repr_dim=384, resume='', sanity=False, seed=140, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, triplet_dec_layers=2, triplet_enc_layers=1, use_wandb=False, weight_decay=0.0001, world_size=2)
Rank 1: Load weights for the object detector from checkpoints/detr-r101-hicodet.pth
=> Rank 1: PViC randomly initialised.
Rank 0: Load weights for the object detector from checkpoints/detr-r101-hicodet.pth
=> Rank 0: PViC randomly initialised.
Epoch 0 =>  mAP: 0.1483, rare: 0.1011, none-rare: 0.1624.
Epoch [1/30], Iter. [0100/2352], Loss: 3.8620, Time[Data/Iter.]: [6.28s/201.51s]
Epoch [1/30], Iter. [0200/2352], Loss: 2.2173, Time[Data/Iter.]: [0.12s/199.69s]
Epoch [1/30], Iter. [0300/2352], Loss: 2.1066, Time[Data/Iter.]: [0.12s/199.08s]
Epoch [1/30], Iter. [0400/2352], Loss: 1.9485, Time[Data/Iter.]: [0.12s/201.27s]
Epoch [1/30], Iter. [0500/2352], Loss: 1.8270, Time[Data/Iter.]: [0.13s/199.80s]
Epoch [1/30], Iter. [0600/2352], Loss: 1.7837, Time[Data/Iter.]: [0.12s/196.83s]
Epoch [1/30], Iter. [0700/2352], Loss: 1.7743, Time[Data/Iter.]: [0.13s/194.57s]
Epoch [1/30], Iter. [0800/2352], Loss: 1.7293, Time[Data/Iter.]: [0.12s/198.28s]
Epoch [1/30], Iter. [0900/2352], Loss: 1.6914, Time[Data/Iter.]: [0.12s/200.15s]
Epoch [1/30], Iter. [1000/2352], Loss: 1.6790, Time[Data/Iter.]: [0.12s/196.27s]
Epoch [1/30], Iter. [1100/2352], Loss: 1.6371, Time[Data/Iter.]: [0.12s/199.65s]
Epoch [1/30], Iter. [1200/2352], Loss: 1.6850, Time[Data/Iter.]: [0.12s/194.97s]
Epoch [1/30], Iter. [1300/2352], Loss: 1.6531, Time[Data/Iter.]: [0.12s/198.41s]
Epoch [1/30], Iter. [1400/2352], Loss: 1.6091, Time[Data/Iter.]: [0.12s/200.63s]
Epoch [1/30], Iter. [1500/2352], Loss: 1.6101, Time[Data/Iter.]: [0.12s/196.76s]
Epoch [1/30], Iter. [1600/2352], Loss: 1.6109, Time[Data/Iter.]: [0.12s/198.47s]
Epoch [1/30], Iter. [1700/2352], Loss: 1.5908, Time[Data/Iter.]: [0.12s/195.29s]
Epoch [1/30], Iter. [1800/2352], Loss: 1.6023, Time[Data/Iter.]: [0.12s/199.69s]
Epoch [1/30], Iter. [1900/2352], Loss: 1.5266, Time[Data/Iter.]: [0.12s/196.88s]
Epoch [1/30], Iter. [2000/2352], Loss: 1.5728, Time[Data/Iter.]: [0.13s/198.12s]
Epoch [1/30], Iter. [2100/2352], Loss: 1.5333, Time[Data/Iter.]: [0.12s/200.25s]
Epoch [1/30], Iter. [2200/2352], Loss: 1.5144, Time[Data/Iter.]: [0.12s/200.74s]
Epoch [1/30], Iter. [2300/2352], Loss: 1.5541, Time[Data/Iter.]: [0.12s/197.17s]
Epoch 1 =>  mAP: 0.3156, rare: 0.3004, none-rare: 0.3201.
fredzzhang commented 10 months ago

Hi @yaoyaosanqi,

There is some randomness in the optimiser as well.

The model at the start of training always yields the same performance because the randomly initialised parameters are determined by the seed. In the training, however, there are other things that could also introduce randomness. You can refer to this PyTorch page for more details.

Cheers, Fred.

hutuo1213 commented 10 months ago

Has PVIC encountered this issue before? We're trying to understand the source of this randomness. Is it due to PVIC or the modifications we made to the model?

fredzzhang commented 10 months ago

It's always been like this.

hutuo1213 commented 10 months ago

As I recall, UPT is completely reproducible. So it's weird PVIC. Thankfully, its performance fluctuates very little. Thank you very much for your guidance.