accuracy 0 - Githubissues

mazatov commented 2 years ago

Hey @VlSomers , might be too late but decided to try my hand at this challenge as well while there is some time. I'm justing testing out your benchline code and in the outputs I get accuracy always 0, which doesn't seem right haha. Do you have any idea what I might be doing wrong with the benchmark code?

 ubuntu@ip-10-0-0-13:~/soccernet/sn-reid$ python benchmarks/baseline/main.py --config-file benchmarks/baseline/configs/baseline_config.yaml
Show configuration
adam:
  beta1: 0.9
  beta2: 0.999
cuhk03:
  classic_split: False
  labeled_images: False
data:
  combineall: False
  eval_metric: soccernetv3
  height: 256
  k_tfm: 1
  load_train_targets: False
  norm_mean: [0.485, 0.456, 0.406]
  norm_std: [0.229, 0.224, 0.225]
  root: datasets
  save_dir: log
  sources: ['soccernetv3']
  split_id: 0
  targets: ['soccernetv3', 'soccernetv3_test', 'soccernetv3_challenge']
  transforms: ['random_flip']
  type: image
  width: 128
  workers: 4
loss:
  name: triplet
  softmax:
    label_smooth: True
  triplet:
    margin: 0.3
    weight_t: 0.5
    weight_x: 0.5
market1501:
  use_500k_distractors: False
model:
  load_weights:
  name: resnet50_fc512
  pretrained: True
  resume:
rmsprop:
  alpha: 0.99
sampler:
  num_cams: 1
  num_datasets: 1
  num_instances: 4
  train_sampler: RandomIdentitySampler
  train_sampler_t: RandomIdentitySampler
sgd:
  dampening: 0.0
  momentum: 0.9
  nesterov: False
soccernetv3:
  training_subset: 0.1
test:
  batch_size: 100
  dist_metric: euclidean
  eval_freq: -1
  evaluate: False
  export_ranking_results: True
  normalize_feature: False
  ranks: [1]
  rerank: False
  start_eval: 0
  visrank: False
  visrank_topk: 10
train:
  base_lr_mult: 0.1
  batch_size: 128
  fixbase_epoch: 0
  gamma: 0.1
  lr: 0.0003
  lr_scheduler: single_step
  max_epoch: 40
  new_layers: ['classifier']
  open_layers: ['classifier']
  optim: adam
  print_freq: 1
  seed: 1
  staged_lr: False
  start_epoch: 0
  stepsize: [20]
  weight_decay: 0.0005
use_gpu: True
video:
  pooling_method: avg
  sample_method: evenly
  seq_len: 15

Collecting env info ...
** System info **
PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 16.04.7 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Clang version: Could not collect
CMake version: version 3.18.2
Libc version: glibc-2.10

Python version: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-4.4.0-1128-aws-x86_64-with-debian-stretch-sid
Is CUDA available: True
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 450.80.02
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.2
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.2
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.2
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.2
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.2
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.2
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.2
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.11.0
[pip3] torchreid==1.4.0
[pip3] torchvision==0.12.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.2.89             h8f6ccaa_10    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h06a4308_640
[conda] mkl-service               2.4.0            py37h402132d_0    conda-forge
[conda] mkl_fft                   1.3.1            py37h3e078e5_1    conda-forge
[conda] mkl_random                1.2.2            py37h219a48f_0    conda-forge
[conda] numpy                     1.21.5                   pypi_0    pypi
[conda] numpy-base                1.21.2           py37h79a1101_0
[conda] pytorch                   1.11.0          py3.7_cuda10.2_cudnn7.6.5_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchreid                 1.4.0                     dev_0    <develop>
[conda] torchvision               0.12.0               py37_cu102    pytorch
        Pillow (9.0.1)

Building train transforms ...
+ resize to 256x128
+ random flip
+ to torch tensor of range [0, 1]
+ normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Building test transforms ...
+ resize to 256x128
+ to torch tensor of range [0, 1]
+ normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
=> Loading train (source) dataset
SoccerNet valid set was already downloaded and unzipped at /home/ubuntu/soccernet/sn-reid/datasets/soccernetv3/reid/valid.
SoccerNet train set was already downloaded and unzipped at /home/ubuntu/soccernet/sn-reid/datasets/soccernetv3/reid/train.
=> Loaded Soccernetv3
  ----------------------------------------
  subset   | # ids | # images | # cameras
  ----------------------------------------
  train    | 15443 |    24872 |       919
  query    | 11638 |    11638 |      1751
  gallery  | 29534 |    34355 |      1751
  ----------------------------------------
=> Loading test (target) dataset
SoccerNet valid set was already downloaded and unzipped at /home/ubuntu/soccernet/sn-reid/datasets/soccernetv3/reid/valid.
SoccerNet train set was already downloaded and unzipped at /home/ubuntu/soccernet/sn-reid/datasets/soccernetv3/reid/train.
=> Loaded Soccernetv3
  ----------------------------------------
  subset   | # ids | # images | # cameras
  ----------------------------------------
  train    | 15443 |    24872 |       919
  query    | 11638 |    11638 |      1751
  gallery  | 29534 |    34355 |      1751
  ----------------------------------------
SoccerNet valid set was already downloaded and unzipped at /home/ubuntu/soccernet/sn-reid/datasets/soccernetv3/reid/valid.
SoccerNet train set was already downloaded and unzipped at /home/ubuntu/soccernet/sn-reid/datasets/soccernetv3/reid/train.
SoccerNet test set was already downloaded and unzipped at /home/ubuntu/soccernet/sn-reid/datasets/soccernetv3/reid/test.
=> Loaded Soccernetv3Test
  ----------------------------------------
  subset   | # ids | # images | # cameras
  ----------------------------------------
  train    |     0 |        0 |         0
  query    | 11777 |    11777 |      1715
  gallery  | 30059 |    34989 |      1715
  ----------------------------------------
SoccerNet test set was already downloaded and unzipped at /home/ubuntu/soccernet/sn-reid/datasets/soccernetv3/reid/test.
SoccerNet challenge set was already downloaded and unzipped at /home/ubuntu/soccernet/sn-reid/datasets/soccernetv3/reid/challenge.
=> Loaded Soccernetv3Challenge
  ----------------------------------------
  subset   | # ids | # images | # cameras
  ----------------------------------------
  train    |     0 |        0 |         0
  query    |  9021 |     9021 |      1310
  gallery  | 26082 |    26082 |      1310
  ----------------------------------------
SoccerNet challenge set was already downloaded and unzipped at /home/ubuntu/soccernet/sn-reid/datasets/soccernetv3/reid/challenge.

  **************** Summary ****************
  source            : ['soccernetv3']
  # source datasets : 1
  # source ids      : 15443
  # source images   : 24872
  # source cameras  : 919
  target            : ['soccernetv3', 'soccernetv3_test', 'soccernetv3_challenge']
  *****************************************

Building model: resnet50_fc512
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /home/ubuntu/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 97.8M/97.8M [00:00<00:00, 288MB/s]
Model complexity: params=24,558,144 flops=4,054,319,616
Building triplet-engine for image-reid
=> Start training
epoch: [1/40][1/482]    time 8.080 (8.080)      data 1.883 (1.883)      eta 1 day, 19:16:17     loss_t 1.1639 (1.1639)  loss_x 9.6623 (9.6623)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][2/482]    time 0.970 (4.525)      data 0.000 (0.942)      eta 1 day, 0:13:55      loss_t 0.8328 (0.9983)  loss_x 9.6807 (9.6715)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][3/482]    time 0.961 (3.337)      data 0.000 (0.628)      eta 17:52:12    loss_t 0.8928 (0.9631)  loss_x 9.6797 (9.6742)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][4/482]    time 0.968 (2.745)      data 0.000 (0.471)      eta 14:41:49    loss_t 1.7378 (1.1568)  loss_x 9.6939 (9.6792)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][5/482]    time 0.979 (2.392)      data 0.000 (0.377)      eta 12:48:20    loss_t 1.1981 (1.1650)  loss_x 9.6429 (9.6719)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][6/482]    time 0.977 (2.156)      data 0.000 (0.314)      eta 11:32:35    loss_t 0.6856 (1.0851)  loss_x 9.6849 (9.6741)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][7/482]    time 0.969 (1.986)      data 0.000 (0.269)      eta 10:38:05    loss_t 1.2633 (1.1106)  loss_x 9.6688 (9.6733)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][8/482]    time 0.975 (1.860)      data 0.000 (0.236)      eta 9:57:27     loss_t 1.0468 (1.1026)  loss_x 9.6788 (9.6740)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][9/482]    time 0.974 (1.762)      data 0.000 (0.210)      eta 9:25:48     loss_t 1.2509 (1.1191)  loss_x 9.6873 (9.6755)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][10/482]   time 0.980 (1.683)      data 0.000 (0.189)      eta 9:00:39     loss_t 0.4579 (1.0530)  loss_x 9.6805 (9.6760)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][11/482]   time 0.968 (1.618)      data 0.000 (0.172)      eta 8:39:45     loss_t 0.9808 (1.0464)  loss_x 9.6781 (9.6762)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][12/482]   time 0.976 (1.565)      data 0.000 (0.157)      eta 8:22:32     loss_t 1.1617 (1.0560)  loss_x 9.7125 (9.6792)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][13/482]   time 0.973 (1.519)      data 0.000 (0.145)      eta 8:07:54     loss_t 0.9496 (1.0478)  loss_x 9.6673 (9.6783)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][14/482]   time 0.967 (1.480)      data 0.001 (0.135)      eta 7:55:12     loss_t 1.1300 (1.0537)  loss_x 9.6748 (9.6780)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][15/482]   time 0.978 (1.447)      data 0.000 (0.126)      eta 7:44:27     loss_t 1.3853 (1.0758)  loss_x 9.6532 (9.6764)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][16/482]   time 0.978 (1.417)      data 0.000 (0.118)      eta 7:35:02     loss_t 0.8940 (1.0644)  loss_x 9.6877 (9.6771)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][17/482]   time 0.966 (1.391)      data 0.000 (0.111)      eta 7:26:29     loss_t 1.0139 (1.0615)  loss_x 9.6823 (9.6774)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][18/482]   time 0.968 (1.367)      data 0.000 (0.105)      eta 7:18:55     loss_t 1.1600 (1.0669)  loss_x 9.6901 (9.6781)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][19/482]   time 0.979 (1.347)      data 0.000 (0.099)      eta 7:12:21     loss_t 0.9251 (1.0595)  loss_x 9.6755 (9.6780)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][20/482]   time 0.976 (1.328)      data 0.000 (0.095)      eta 7:06:22     loss_t 1.2995 (1.0715)  loss_x 9.6888 (9.6785)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][21/482]   time 0.981 (1.312)      data 0.000 (0.090)      eta 7:01:02     loss_t 0.9801 (1.0671)  loss_x 9.6916 (9.6791)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][22/482]   time 0.986 (1.297)      data 0.000 (0.086)      eta 6:56:15     loss_t 0.5528 (1.0437)  loss_x 9.6771 (9.6790)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][23/482]   time 0.985 (1.283)      data 0.000 (0.082)      eta 6:51:52     loss_t 0.9734 (1.0407)  loss_x 9.6417 (9.6774)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][24/482]   time 0.999 (1.271)      data 0.000 (0.079)      eta 6:48:03     loss_t 0.7406 (1.0282)  loss_x 9.6794 (9.6775)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][25/482]   time 0.991 (1.260)      data 0.000 (0.076)      eta 6:44:25     loss_t 1.0094 (1.0274)  loss_x 9.6800 (9.6776)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][26/482]   time 0.980 (1.249)      data 0.000 (0.073)      eta 6:40:57     loss_t 0.6596 (1.0133)  loss_x 9.6811 (9.6777)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][27/482]   time 0.986 (1.240)      data 0.000 (0.070)      eta 6:37:48     loss_t 0.6593 (1.0002)  loss_x 9.6837 (9.6779)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][28/482]   time 0.980 (1.230)      data 0.000 (0.068)      eta 6:34:48     loss_t 0.3822 (0.9781)  loss_x 9.6881 (9.6783)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][29/482]   time 0.977 (1.222)      data 0.000 (0.065)      eta 6:31:59     loss_t 0.8385 (0.9733)  loss_x 9.6898 (9.6787)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][30/482]   time 0.983 (1.214)      data 0.000 (0.063)      eta 6:29:24     loss_t 0.4484 (0.9558)  loss_x 9.6916 (9.6791)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][31/482]   time 0.980 (1.206)      data 0.000 (0.061)      eta 6:26:58     loss_t 0.5138 (0.9415)  loss_x 9.7219 (9.6805)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][32/482]   time 0.996 (1.200)      data 0.000 (0.059)      eta 6:24:50     loss_t 0.5446 (0.9291)  loss_x 9.7266 (9.6820)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][33/482]   time 0.989 (1.193)      data 0.000 (0.057)      eta 6:22:46     loss_t 0.5246 (0.9169)  loss_x 9.7235 (9.6832)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][34/482]   time 0.985 (1.187)      data 0.000 (0.056)      eta 6:20:47     loss_t 1.7380 (0.9410)  loss_x 9.7431 (9.6850)  acc 0.0000 (0.0000)     lr 0.000300
epoch: [1/40][35/482]   time 0.975 (1.181)      data 0.000 (0.054)      eta 6:18:49     loss_t 1.1075 (0.9458)  loss_x 9.7318 (9.6863)  acc 0.0000 (0.0000)     lr 0.000300

VlSomers commented 2 years ago

Hi Mike, I see you trained for a very few steps, the accuracy will start increasing much later during the training process. You could try setting the soccernetv3.training_subset config to a smaller value (0.01 for example) for a much shorter training time and to see what happens.

mazatov commented 2 years ago

Ah yeah, got it. Thanks. I was assuming it was starting form pretrained Market weights and was surprised to see 0. Training from scratch makes sense it would start with 0.

mazatov commented 2 years ago

@VlSomers follow up question :)

On a small subset of 1% I got a high accuracy. However, when I train on 50% of the dataset I still get pretty low accuracy while the Map score is high. Do you know how this accuracy is calculated? Seems so strange we'd have a decent map score with such low accuracy.

Below are the last few lines of the calculations.

epoch: [50/50][1276/1286]   time 0.972 (0.986)  data 0.000 (0.018)  eta 0:00:09 loss_t 0.0531 (0.0428)  loss_x 11.2660 (11.2688)    acc 0.0000 (0.1678) lr 0.000003
epoch: [50/50][1277/1286]   time 0.977 (0.986)  data 0.000 (0.018)  eta 0:00:08 loss_t 0.0805 (0.0428)  loss_x 11.2628 (11.2688)    acc 3.1250 (0.1701) lr 0.000003
epoch: [50/50][1278/1286]   time 0.970 (0.986)  data 0.000 (0.018)  eta 0:00:07 loss_t 0.0501 (0.0428)  loss_x 11.2664 (11.2688)    acc 0.7812 (0.1706) lr 0.000003
epoch: [50/50][1279/1286]   time 0.964 (0.986)  data 0.006 (0.018)  eta 0:00:06 loss_t 0.0356 (0.0428)  loss_x 11.2624 (11.2688)    acc 2.3438 (0.1723) lr 0.000003
epoch: [50/50][1280/1286]   time 0.965 (0.986)  data 0.000 (0.018)  eta 0:00:05 loss_t 0.0842 (0.0429)  loss_x 11.2630 (11.2688)    acc 2.3438 (0.1740) lr 0.000003
epoch: [50/50][1281/1286]   time 0.964 (0.986)  data 0.000 (0.018)  eta 0:00:04 loss_t 0.0539 (0.0429)  loss_x 11.2661 (11.2688)    acc 0.0000 (0.1738) lr 0.000003
epoch: [50/50][1282/1286]   time 0.962 (0.986)  data 0.000 (0.018)  eta 0:00:03 loss_t 0.0410 (0.0429)  loss_x 11.2639 (11.2688)    acc 0.0000 (0.1737) lr 0.000003
epoch: [50/50][1283/1286]   time 0.963 (0.986)  data 0.000 (0.018)  eta 0:00:02 loss_t 0.0328 (0.0429)  loss_x 11.2647 (11.2688)    acc 0.7812 (0.1742) lr 0.000003
epoch: [50/50][1284/1286]   time 0.963 (0.986)  data 0.000 (0.018)  eta 0:00:01 loss_t 0.0755 (0.0429)  loss_x 11.2599 (11.2688)    acc 3.1250 (0.1765) lr 0.000003
epoch: [50/50][1285/1286]   time 0.965 (0.986)  data 0.000 (0.018)  eta 0:00:00 loss_t 0.0721 (0.0429)  loss_x 11.2589 (11.2688)    acc 0.7812 (0.1769) lr 0.000003
epoch: [50/50][1286/1286]   time 0.963 (0.986)  data 0.000 (0.018)  eta 0:00:00 loss_t 0.0410 (0.0429)  loss_x 11.2601 (11.2688)    acc 0.0000 (0.1768) lr 0.000003
=> Final test
##### Evaluating soccernetv3 (source) #####
Extracting features from query set ...
Done, obtained 11638-by-512 matrix
Extracting features from gallery set ...
Done, obtained 34355-by-512 matrix
Speed: 0.0166 sec/batch
Computing distance matrix with metric=euclidean ...
Exporting ranking results to 'log/ranking_results_soccernetv3_2022-05-19_14_22_54_648.json' for external evaluation...
Computing CMC and mAP ...
** Results **
mAP: 63.6%
CMC curve
Rank-1  : 52.4%

VlSomers commented 2 years ago

Hi Mazatov, it seems indeed strange, as a comparison, when using a soccernetv3.training_subset of 0.02, I start getting 100% accuracy after +/- 20 epochs. The only explanation I see is the "infeasibility" of the classification task when the training dataset becomes too big: if you train on 50% of the dataset, you'll end up with so much training identities (and multiple identities for the same player, as explained in the README and the video tutorial), that classifying a sample into its unique correct identity becomes infeasible. However, the network still learns something (the triplet loss plays an important role for that) and you still get nice final ranking performance.

mazatov commented 2 years ago

Hi @VlSomers , yeah I get similar results on 0.02. That's an interesting thought about lots of idetities. Overall the trained model still performs well on the test dataset so the triplet loss is working. I was assuming the accuracy just measures if it's splitting classes well within the batch, or within the triplet, so it never actually compares all the identities. Do you know the accuracy is calculated here?

SoccerNet / sn-reid

accuracy 0 #4