davidnvq / grit

GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
177 stars 27 forks source link

accuracy of training_caption with freezing detector #18

Closed verigle closed 2 years ago

verigle commented 2 years ago

with suggestion of accumulation_steps in #15 , I modified the code of caption_engine.py of train_xe function

    with tqdm(desc=f'Epoch {epoch} - train', unit='it', total=len(dataloaders['train'])) as pbar:
        for it, batch in enumerate(dataloaders['train']):
            out = model(batch['samples'], batch['captions'])

            captions_gt = batch['captions'][:, 1:].contiguous()
            out = out[:, :-1].contiguous()
            loss = loss_fn(out.view(-1, len(text_field.vocab)), captions_gt.view(-1))
            loss = loss / config.optimizer.accumulation_steps
            loss.backward()

            loss = gather_result(loss)
            running_loss += loss.item()

            pbar.set_postfix(loss=running_loss / (it + 1))
            pbar.update()

            if scheduler is not None:
                # accumate
                if (it + 1) % config.optimizer.accumulation_steps == 0:
                    optimizers['model'].step()
                    optimizers['backbone'].step()

                    lr = scheduler.step()
                    assert optimizers['model'].param_groups[0]['lr'] == lr, "LR scheduler doesn't work properly."

                    optimizers['model'].zero_grad()
                    optimizers['backbone'].zero_grad()

but the CIDEr score of CIDEr first epoch is very low

Epoch 0 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:18<00:00,  5.01it/s, loss=1.25][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:21<00:00,  5.52it/s, loss=1.25]
Epoch 0 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.46it/s, loss=3.97][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.59it/s, loss=3.97]
Epoch 0 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22200s
Epoch 0 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20595s
Epoch 0 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.76it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.16it/s]
Epoch: 0 iters: 157
Total time per 1 batch: 0.20528s
Epoch 0: valid scores: {'BLEU': [0.4669783296661464, 0.24582240555971746, 0.13479587494094583, 0.07239531663650367], 'METEOR': 0.10954924428703441, 'ROUGE': 0.34872282632850365, 'CIDEr': 0.08790172488631927}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 0, valid, 8.79, 46.70, 7.24, 34.87, 10.95, 24.58, 13.48, 1.25, 0.00, 0.00, fr_xe, 3.97
Epoch 0 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21069s
Epoch 0 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20469s
Epoch 0 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 0 iters: 157
Total time per 1 batch: 0.20416s
Epoch 0: test scores: {'BLEU': [0.4664962002208445, 0.24325046136142325, 0.1341976162794273, 0.0742708785109443], 'METEOR': 0.10937207416996544, 'ROUGE': 0.34777669352106455, 'CIDEr': 0.08940798205091863}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 0, test , 8.94, 46.65, 7.43, 34.78, 10.94, 24.33, 13.42, 1.25, 0.00, 0.00, fr_xe, 3.97

then the CIDEr score not in range of [1.05 - 1.29] ( https://github.com/davidnvq/grit/issues/17#issuecomment-1234603645_ )

verigle commented 2 years ago

the output config file:

exp:
  seed: 42
  name: caption_4ds_20220906
  rank: 0
  ngpus_per_node: 1
  world_size: 1
  checkpoint: ''
  eval: false
  resume: false
dataset:
  overfit: false
  ann_root: ${oc.env:DATA_ROOT}/annotations
  img_root: ${oc.env:DATA_ROOT}
  hdf5_path: ${oc.env:DATA_ROOT}/all_splits_4ds.h5
  vocab_path: ${oc.env:DATA_ROOT}/annotations/vocab.json
  use_gri_feat: ${model.use_gri_feat}
  use_reg_feat: ${model.use_reg_feat}
  transform_cfg:
    size:
    - 384
    - 640
    resize_name: maxwh
    randaug: true
model:
  use_gri_feat: true
  use_reg_feat: true
  grid_feat_dim: 1024
  frozen_stages: 2
  beam_size: 5
  beam_len: 20
  dropout: 0.2
  attn_dropout: 0.2
  vocab_size: 10201
  max_len: 54
  pad_idx: 1
  bos_idx: 2
  eos_idx: 3
  d_model: 512
  n_heads: 8
  grit_net:
    n_memories: 1
    n_layers: 3
  grid_stage: -1
  detector:
    checkpoint: modelzoo/detector_checkpoint_4ds.pth
    d_model: 512
    dim_feedforward: 1024
    num_heads: 8
    num_layers: 6
    num_levels: 4
    num_points: 4
    num_queries: 150
    num_classes: 1849
    dropout: 0.1
    activation: relu
    return_intermediate: true
    with_box_refine: true
    det_module:
      reduced_dim: 512
      dim_feedforward: 1024
      num_heads: 8
      num_layers: 6
      num_levels: 4
      num_points: 4
      return_intermediate: true
      num_queries: 150
      num_classes: 1849
      dropout: 0.1
      activation: relu
      with_box_refine: true
      aux_loss: true
  cap_generator:
    decoder_name: parallel
    n_layers: 3
    activation: sigmoid
optimizer:
  warmup_init_lr: 1.0e-05
  min_lr: 0.0001
  xe_lr: 0.0001
  sc_lr: 5.0e-06
  xe_backbone_lr: 1.0e-05
  sc_backbone_lr: 5.0e-06
  weight_decay: 0.01
  beta_1: 0.9
  beta_2: 0.99
  batch_size: 32
  accumulation_steps: 4
  num_workers: 2
  freezing_xe_epochs: 10
  freezing_sc_epochs: 10
  finetune_xe_epochs: 0
  finetune_sc_epochs: 0
verigle commented 2 years ago

10 epoch log;

Train: rank=0, epoch=0, phase=fr_xe
Epoch 0 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:18<00:00,  5.01it/s, loss=1.25][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:21<00:00,  5.52it/s, loss=1.25]
Epoch 0 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.46it/s, loss=3.97][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.59it/s, loss=3.97]
Epoch 0 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22200s
Epoch 0 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20595s
Epoch 0 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.76it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.16it/s]
Epoch: 0 iters: 157
Total time per 1 batch: 0.20528s
Epoch 0: valid scores: {'BLEU': [0.4669783296661464, 0.24582240555971746, 0.13479587494094583, 0.07239531663650367], 'METEOR': 0.10954924428703441, 'ROUGE': 0.34872282632850365, 'CIDEr': 0.08790172488631927}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 0, valid, 8.79, 46.70, 7.24, 34.87, 10.95, 24.58, 13.48, 1.25, 0.00, 0.00, fr_xe, 3.97
Epoch 0 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21069s
Epoch 0 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20469s
Epoch 0 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 0 iters: 157
Total time per 1 batch: 0.20416s
Epoch 0: test scores: {'BLEU': [0.4664962002208445, 0.24325046136142325, 0.1341976162794273, 0.0742708785109443], 'METEOR': 0.10937207416996544, 'ROUGE': 0.34777669352106455, 'CIDEr': 0.08940798205091863}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 0, test , 8.94, 46.65, 7.43, 34.78, 10.94, 24.33, 13.42, 1.25, 0.00, 0.00, fr_xe, 3.97
Train: rank=0, epoch=1, phase=fr_xe
Epoch 1 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:08<00:00,  6.05it/s, loss=0.921][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.59it/s, loss=0.921]
Epoch 1 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.42it/s, loss=3.37][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.60it/s, loss=3.37]
Epoch 1 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22185s
Epoch 1 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20407s
Epoch 1 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 1 iters: 157
Total time per 1 batch: 0.20349s
Epoch 1: valid scores: {'BLEU': [0.4177599999999917, 0.22081040233144753, 0.13440728245371522, 0.086634145554716], 'METEOR': 0.10917767439359446, 'ROUGE': 0.3293001799815639, 'CIDEr': 0.09119403689594928}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 1, valid, 9.12, 41.78, 8.66, 32.93, 10.92, 22.08, 13.44, 0.92, 0.00, 0.00, fr_xe, 3.37
Epoch 1 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21100s
Epoch 1 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.84it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20332s
Epoch 1 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.85it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 1 iters: 157
Total time per 1 batch: 0.20278s
Epoch 1: test scores: {'BLEU': [0.41875999999999164, 0.22176797684867283, 0.13475045843026873, 0.08727707854272869], 'METEOR': 0.10943320624155432, 'ROUGE': 0.3286083021124142, 'CIDEr': 0.0934703590652789}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 1, test , 9.35, 41.88, 8.73, 32.86, 10.94, 22.18, 13.48, 0.92, 0.00, 0.00, fr_xe, 3.37
Train: rank=0, epoch=2, phase=fr_xe
Epoch 2 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:15<00:00,  5.11it/s, loss=0.815][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:17<00:00,  5.55it/s, loss=0.815]
Epoch 2 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.30it/s, loss=3.11][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.56it/s, loss=3.11]
Epoch 2 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22380s
Epoch 2 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20404s
Epoch 2 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 2 iters: 157
Total time per 1 batch: 0.20375s
Epoch 2: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 2, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.82, 0.00, 0.00, fr_xe, 3.11
Epoch 2 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21104s
Epoch 2 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20410s
Epoch 2 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 2 iters: 157
Total time per 1 batch: 0.20358s
Epoch 2: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 2, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.82, 0.00, 0.00, fr_xe, 3.11
Train: rank=0, epoch=3, phase=fr_xe
Epoch 3 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.80it/s, loss=0.762][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.57it/s, loss=0.762]
Epoch 3 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.60it/s, loss=2.97][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.70it/s, loss=2.97]
Epoch 3 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22389s
Epoch 3 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20444s
Epoch 3 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:36<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 3 iters: 157
Total time per 1 batch: 0.20382s
Epoch 3: valid scores: {'BLEU': [0.3880270749907782, 0.1498102872688645, 0.08511781760022998, 0.050539934594252114], 'METEOR': 0.09991918939322139, 'ROUGE': 0.30912337673077656, 'CIDEr': 0.07096231761736432}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 3, valid, 7.10, 38.80, 5.05, 30.91, 9.99, 14.98, 8.51, 0.76, 0.00, 0.00, fr_xe, 2.97
Epoch 3 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20925s
Epoch 3 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.83it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20409s
Epoch 3 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:36<00:00,  4.84it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.22it/s]
Epoch: 3 iters: 157
Total time per 1 batch: 0.20361s
Epoch 3: test scores: {'BLEU': [0.38801692079215583, 0.15115514556365187, 0.08703201785437005, 0.053794063909602625], 'METEOR': 0.1004634083610197, 'ROUGE': 0.3098636871323689, 'CIDEr': 0.07426506154284174}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 3, test , 7.43, 38.80, 5.38, 30.99, 10.05, 15.12, 8.70, 0.76, 0.00, 0.00, fr_xe, 2.97
Train: rank=0, epoch=4, phase=fr_xe
Epoch 4 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:18<00:00,  4.65it/s, loss=0.729][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:20<00:00,  5.53it/s, loss=0.729]
Epoch 4 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:26<00:00, 12.23it/s, loss=2.88][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.42it/s, loss=2.88]
Epoch 4 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22306s
Epoch 4 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20433s
Epoch 4 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 4 iters: 157
Total time per 1 batch: 0.20391s
Epoch 4: valid scores: {'BLEU': [0.3956675189916203, 0.14959241499951006, 0.08455045357940592, 0.05072552705044864], 'METEOR': 0.10225475159056896, 'ROUGE': 0.3122258530606642, 'CIDEr': 0.07015325932525968}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 4, valid, 7.02, 39.57, 5.07, 31.22, 10.23, 14.96, 8.46, 0.73, 0.00, 0.00, fr_xe, 2.88
Epoch 4 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21257s
Epoch 4 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20408s
Epoch 4 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 4 iters: 157
Total time per 1 batch: 0.20355s
Epoch 4: test scores: {'BLEU': [0.39403577331772227, 0.14921009455837111, 0.083850153056571, 0.050482385367933987], 'METEOR': 0.10182105655673135, 'ROUGE': 0.31149491539952134, 'CIDEr': 0.06971117398735341}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 4, test , 6.97, 39.40, 5.05, 31.15, 10.18, 14.92, 8.39, 0.73, 0.00, 0.00, fr_xe, 2.88
Train: rank=0, epoch=5, phase=fr_xe
Epoch 5 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:12<00:00,  5.54it/s, loss=0.707][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.57it/s, loss=0.707]
Epoch 5 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.46it/s, loss=2.83][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.63it/s, loss=2.83]
Epoch 5 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.23301s
Epoch 5 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20449s
Epoch 5 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 5 iters: 157
Total time per 1 batch: 0.20397s
Epoch 5: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 5, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.71, 0.00, 0.00, fr_xe, 2.83
Epoch 5 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20863s
Epoch 5 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20429s
Epoch 5 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 5 iters: 157
Total time per 1 batch: 0.20373s
Epoch 5: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 5, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.71, 0.00, 0.00, fr_xe, 2.83
Train: rank=0, epoch=6, phase=fr_xe
Epoch 6 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.17it/s, loss=0.69][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:16<00:00,  5.55it/s, loss=0.69]
Epoch 6 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.35it/s, loss=2.79][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.60it/s, loss=2.79]
Epoch 6 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22445s
Epoch 6 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20437s
Epoch 6 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.80it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.18it/s]
Epoch: 6 iters: 157
Total time per 1 batch: 0.20382s
Epoch 6: valid scores: {'BLEU': [0.3626799999999928, 0.18942318055964488, 0.07775216931829154, 0.04351562148082592], 'METEOR': 0.09470731840804052, 'ROUGE': 0.3105374796878281, 'CIDEr': 0.055744577283453285}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 6, valid, 5.57, 36.27, 4.35, 31.05, 9.47, 18.94, 7.78, 0.69, 0.00, 0.00, fr_xe, 2.79
Epoch 6 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21029s
Epoch 6 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20432s
Epoch 6 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 6 iters: 157
Total time per 1 batch: 0.20382s
Epoch 6: test scores: {'BLEU': [0.35903999999999286, 0.18394979025085054, 0.0703169688713343, 0.03666873619238479], 'METEOR': 0.09287212083601691, 'ROUGE': 0.3071016066175093, 'CIDEr': 0.049071662717992064}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 6, test , 4.91, 35.90, 3.67, 30.71, 9.29, 18.39, 7.03, 0.69, 0.00, 0.00, fr_xe, 2.79
Train: rank=0, epoch=7, phase=fr_xe
Epoch 7 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:21<00:00,  5.86it/s, loss=0.677][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:23<00:00,  5.51it/s, loss=0.677]
Epoch 7 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.29it/s, loss=2.77][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.52it/s, loss=2.77]
Epoch 7 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22371s
Epoch 7 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20449s
Epoch 7 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.13it/s]
Epoch: 7 iters: 157
Total time per 1 batch: 0.20394s
Epoch 7: valid scores: {'BLEU': [0.38179999999999237, 0.20104699284826827, 0.08538310838454195, 0.04516644657768871], 'METEOR': 0.10478659426096619, 'ROUGE': 0.3169023526256761, 'CIDEr': 0.07155207847141871}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 7, valid, 7.16, 38.18, 4.52, 31.69, 10.48, 20.10, 8.54, 0.68, 0.00, 0.00, fr_xe, 2.77
Epoch 7 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21468s
Epoch 7 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20399s
Epoch 7 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.16it/s]
Epoch: 7 iters: 157
Total time per 1 batch: 0.20353s
Epoch 7: test scores: {'BLEU': [0.37899999999999245, 0.19519767758180867, 0.0771990834096997, 0.03888923579121348], 'METEOR': 0.10304347878594912, 'ROUGE': 0.3141675146161081, 'CIDEr': 0.06542015344438794}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 7, test , 6.54, 37.90, 3.89, 31.42, 10.30, 19.52, 7.72, 0.68, 0.00, 0.00, fr_xe, 2.77
Train: rank=0, epoch=8, phase=fr_xe
Epoch 8 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:10<00:00,  5.50it/s, loss=0.666][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:13<00:00,  5.58it/s, loss=0.666]
Epoch 8 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.32it/s, loss=2.75][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.69it/s, loss=2.75]
Epoch 8 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22344s
Epoch 8 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20453s
Epoch 8 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.80it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 8 iters: 157
Total time per 1 batch: 0.20401s
Epoch 8: valid scores: {'BLEU': [0.38179999999999237, 0.20104699284826827, 0.08538310838454195, 0.04516644657768871], 'METEOR': 0.10478659426096619, 'ROUGE': 0.3169023526256761, 'CIDEr': 0.07155207847141871}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 8, valid, 7.16, 38.18, 4.52, 31.69, 10.48, 20.10, 8.54, 0.67, 0.00, 0.00, fr_xe, 2.75
Epoch 8 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21202s
Epoch 8 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20418s
Epoch 8 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 8 iters: 157
Total time per 1 batch: 0.20374s
Epoch 8: test scores: {'BLEU': [0.37899999999999245, 0.19519767758180867, 0.0771990834096997, 0.03888923579121348], 'METEOR': 0.10304347878594912, 'ROUGE': 0.3141675146161081, 'CIDEr': 0.06542015344438794}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 8, test , 6.54, 37.90, 3.89, 31.42, 10.30, 19.52, 7.72, 0.67, 0.00, 0.00, fr_xe, 2.75
Train: rank=0, epoch=9, phase=fr_xe
Epoch 9 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.36it/s, loss=0.656][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:13<00:00,  5.57it/s, loss=0.656]
Epoch 9 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.35it/s, loss=2.74][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.50it/s, loss=2.74]
Epoch 9 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22454s
Epoch 9 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.79it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20455s
Epoch 9 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 9 iters: 157
Total time per 1 batch: 0.20402s
Epoch 9: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 9, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.66, 0.00, 0.00, fr_xe, 2.74
Epoch 9 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20851s
Epoch 9 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20436s
Epoch 9 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 9 iters: 157
Total time per 1 batch: 0.20386s
Epoch 9: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 9, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.66, 0.00, 0.00, fr_xe, 2.74
davidnvq commented 2 years ago

@verigle I'm sorry that I didn't notify you a bug of loading the pretrained object detector. The previous code removes all the pre-trained weights, making all weights as random.

Check https://github.com/davidnvq/grit/issues/17 for more detail.

davidnvq commented 2 years ago

It is expected that your results are even worse than #17 because you freeze all weights of the detector. Please refer to #17 to fix the bug. Thanks!

verigle commented 2 years ago

I already refer to https://github.com/davidnvq/grit/issues/17 to fix the bug, and the code of Transfermer has change to that

class Transformer(BaseCaptioner):

    def __init__(self,
                 grid_net,
                 cap_generator,
                 bos_idx=2,
                 detector=None,
                 use_gri_feat=True,
                 use_reg_feat=False,
                 cached_features=False,
                 config=None):
        super(Transformer, self).__init__()
        self.bos_idx = bos_idx
        self.grid_net = grid_net
        self.cap_generator = cap_generator
        self.use_reg_feat = use_reg_feat
        self.use_gri_feat = use_gri_feat
        self.cached_features = cached_features
        self.config = config

        if self.use_gri_feat:
            self.register_state('gri_feat', None)
            self.register_state('gri_mask', None)

        if self.use_reg_feat:
            self.register_state('reg_feat', None)
            self.register_state('reg_mask', None)

        self.init_weights()
        self.detector = detector

however there also has another problem: if I use relative path to set args of model.detector.checkpoint, the weights cound not load,

 if os.path.exists(config.model.detector.checkpoint):
        checkpoint = torch.load(config.model.detector.checkpoint, map_location='cpu')
        missing, unexpected = detector.load_state_dict(checkpoint['model'], strict=False)
        print("det missing:", len(missing))
        print("det unexpected:", len(unexpected))
    else:
        print("could not load detector weights : detector checkpoint file not found !")

I debug find that os.path.exists(config.model.detector.checkpoint) is False, and then I use os.path.abspath(config.model.detector.checkpoint)) show that the abspath is equal to concatate by 'hydra.run.dir' and config.model.detector.checkpoint, so I think hydra change the content root path of PYTHONPATH, which leader to os.path.exists get wrong abspath to check .

hydra:
  run:
    dir: outputs/checkpoints/grit/coco/${exp.name}

when I change the args of model.detector.checkpoint to absolute path, os.path.exists(config.model.detector.checkpoint) is True, then the code of load the weights of detector can be executed.

davidnvq commented 2 years ago

@verigle Have you resolved the issue? I found that hydra version 1.2 has a critical problem with paths. I recommend you to downgrade hydra into lower versions as in https://github.com/davidnvq/grit/issues/16.

verigle commented 2 years ago

thank you for your replay , the version of hydra-code has downgrde to 1.1.0

$ pip list   | grep hydra
hydra-core                    1.1.0

I will retraining the model of caption later to confirm whether the issue is solved.

verigle commented 2 years ago
torch.distributed.barrier()
    if rank == 0:
        num_gpus = dist.get_world_size()
        with h5py.File(config.dataset.hdf5_path, 'w') as agg_file:
            L = len(dataloader) * BATCH_SIZE * num_gpus
            agg_file.create_dataset('image_ids', data=dataset.img_ids)
            gri_features = agg_file.create_dataset('gri_feat', (L, fh * fw, C), dtype='float32')
            gri_masks = agg_file.create_dataset('gri_mask', (L, 1, 1, fh * fw), dtype='bool')
            if config.model.use_reg_feat:
                Q = config.model.detector.num_queries
                D = config.model.detector.d_model
                reg_features = agg_file.create_dataset('reg_feat', (L, Q, D), dtype='float32')
                reg_masks = agg_file.create_dataset('reg_mask', (L, 1, 1, Q), dtype='bool')

            for r in range(num_gpus):
                filename = f"{r}_" + os.path.basename(config.dataset.hdf5_path)
                dir_path = os.path.dirname(config.dataset.hdf5_path)
                path = os.path.join(dir_path, filename)

                with h5py.File(path, 'r') as f:
                    tmp_ids_list = f['tmp_ids_list'][:len(f['tmp_ids_list'])]

                    for tmp_idx, tmp_id in tqdm(enumerate(tmp_ids_list), total=len(tmp_ids_list)):
                        img_idx = dataset.img_id2idx[tmp_id]
                        # Add grid features
                        gri_features[img_idx] = f['gri_feat'][tmp_idx]
                        gri_masks[img_idx] = f['gri_mask'][tmp_idx]

                        # Add det features
                        if config.model.use_reg_feat:
                            reg_features[img_idx] = f['reg_feat'][tmp_idx]
                            reg_masks[img_idx] = f['reg_mask'][tmp_idx]

                os.remove(path)
                print(f"rank: {rank} - Delete {path}")
        print(f"Saving all to HDF5 file: {config.dataset.hdf5_path}.")

when runing the loop of for tmp_idx, tmp_id in tqdm(enumerate(tmp_ids_list), total=len(tmp_ids_list)): is very slowly, how to optimize this code? can I rename the filename 0_all_splits.h5 to all_splits.h5 when only used one gpu?

davidnvq commented 2 years ago

True. In your case, you can uncomment this "merging code" and change the path in the config to 0_all_splits.h5. Indeed, I extracted using 8 GPUs and it runs quite fast.

verigle commented 2 years ago

could you release the HDF5 file (4ds and vg) for training caption with the freezed detector, I want to check the HDF5 file generated by me whether has problem .

davidnvq commented 2 years ago

These files are quite large. I'm unable to find a suitable place to share hundreds of GB. However, I can extract the features of a few images in HDF5 files and share them here. Sorry, I'll do it tmr as those files are on the server.

verigle commented 2 years ago

if fail to transfer large file, could you send me HDF5 file hash value generate by md5sum and training log of freezing detector?

would you like try use Lark to send large file with hundreds of GB? verigle send contact application,look at Lark. https://www.larksuite.com/invitation/page/add_contact/?token=672jcffb-14c2-47a5-a9ab-2ffcdf4b0b4g&amp;unique_id=gOWzGw6_gVmH84zowNfclQ==

davidnvq commented 2 years ago

The training log of the experiment with a frozen detector (pretrained on 4DS)

training log of freezing detector?

Training log - Result.txt ``` cat result.txt Epoch 0: valid scores: {'BLEU': [0.7538250289537521, 0.596562414985615, 0.4540185893450545, 0.33969480502142646], 'METEOR': 0.26079146802120723, 'ROUGE': 0.5553119459813104, 'CIDEr': 1.0461147458266575} Epoch 0: test scores: {'BLEU': [0.7538096834276076, 0.5958968344182032, 0.4553681588957574, 0.3426630347328637], 'METEOR': 0.2609191614973612, 'ROUGE': 0.5564575152162115, 'CIDEr': 1.0609378106382028} Epoch 1: test scores: {'BLEU': [0.7785420525194918, 0.6227267712579907, 0.48318177558956765, 0.37110900936399915], 'METEOR': 0.2788398012006459, 'ROUGE': 0.5748468163868693, 'CIDEr': 1.1839931965034516} Epoch 1: valid scores: {'BLEU': [0.778855323496365, 0.6251732213529925, 0.4862454816645014, 0.374640451374451], 'METEOR': 0.27977644697342474, 'ROUGE': 0.5777731655865079, 'CIDEr': 1.1764461871293133} Epoch 2: test scores: {'BLEU': [0.7823447997166053, 0.6290398800710232, 0.4922573666405114, 0.38225896666687104], 'METEOR': 0.2852446835394714, 'ROUGE': 0.5817635371239943, 'CIDEr': 1.2261100724095657} Epoch 2: valid scores: {'BLEU': [0.7846525925685631, 0.634155649067113, 0.49714663098357315, 0.3867603929942251], 'METEOR': 0.28548504139308056, 'ROUGE': 0.5835172379457306, 'CIDEr': 1.2196434807827121} Epoch 3: valid scores: {'BLEU': [0.7917327942542083, 0.6424203422561312, 0.5054563752697528, 0.39523228933069654], 'METEOR': 0.2916085270448981, 'ROUGE': 0.5917410971395807, 'CIDEr': 1.2479601987952522} Epoch 3: test scores: {'BLEU': [0.7906897739999386, 0.6369097230969365, 0.49835775716208586, 0.3880412383761517], 'METEOR': 0.2888664429923958, 'ROUGE': 0.5883285170501267, 'CIDEr': 1.2502389474614086} Epoch 4: valid scores: {'BLEU': [0.7957569998017531, 0.6471634818163418, 0.5116644846864994, 0.4017259182875278], 'METEOR': 0.2932035123933825, 'ROUGE': 0.5939159193613331, 'CIDEr': 1.2679474067353362} Epoch 4: test scores: {'BLEU': [0.7942980159845641, 0.6441850389898548, 0.5097528189164301, 0.4000709255882639], 'METEOR': 0.2934497645559173, 'ROUGE': 0.5920811961336366, 'CIDEr': 1.2785493054299257} Epoch 5: valid scores: {'BLEU': [0.7970202352961274, 0.6460475096181677, 0.5092481290741753, 0.3988404132262942], 'METEOR': 0.293771562418025, 'ROUGE': 0.5944456649831105, 'CIDEr': 1.2684498594234204} Epoch 5: test scores: {'BLEU': [0.7946479572981971, 0.6448607696501109, 0.5093860475788792, 0.3993787811324676], 'METEOR': 0.29416671763842106, 'ROUGE': 0.594694350219191, 'CIDEr': 1.279104734906149} Epoch 6: test scores: {'BLEU': [0.7898823296042636, 0.6368843225102946, 0.5033368888704767, 0.39599037131009784], 'METEOR': 0.2957331793867509, 'ROUGE': 0.5896175922512735, 'CIDEr': 1.2786846207140317} Epoch 6: valid scores: {'BLEU': [0.7921119265267416, 0.6394890174963228, 0.505395350735759, 0.39649127354727426], 'METEOR': 0.2951546584765399, 'ROUGE': 0.5912797830527387, 'CIDEr': 1.26094086083513} Epoch 7: valid scores: {'BLEU': [0.7953589267075801, 0.6448840953790032, 0.5091179225233083, 0.3988465934645186], 'METEOR': 0.2944972576853284, 'ROUGE': 0.5922647371187016, 'CIDEr': 1.267485150782694} Epoch 7: test scores: {'BLEU': [0.797923092592688, 0.6469203098860058, 0.5112677284646816, 0.40192986485317983], 'METEOR': 0.2971981305523139, 'ROUGE': 0.5959802899904283, 'CIDEr': 1.2952236361482858} Epoch 8: valid scores: {'BLEU': [0.7981481278271069, 0.6489911081729335, 0.5137796149165833, 0.40375749598580507], 'METEOR': 0.296309514711055, 'ROUGE': 0.5939607014800888, 'CIDEr': 1.2804176361573352} Epoch 8: test scores: {'BLEU': [0.7981913127533568, 0.647616139988432, 0.5127104752676751, 0.40299519682268475], 'METEOR': 0.2968949157793757, 'ROUGE': 0.5945100409642673, 'CIDEr': 1.2950229705521017} Epoch 9: test scores: {'BLEU': [0.7936862177479342, 0.6431453662200723, 0.5086368535541335, 0.39977487361802977], 'METEOR': 0.29702346000971386, 'ROUGE': 0.5939995293929079, 'CIDEr': 1.2911802443483553} Epoch 9: valid scores: {'BLEU': [0.7957532903418028, 0.6448849552573164, 0.509262538898219, 0.39983078089529245], 'METEOR': 0.29571605978619037, 'ROUGE': 0.5928173411856615, 'CIDEr': 1.2767403521851346} Epoch 10: valid scores: {'BLEU': [0.8392216942224467, 0.6930766554577301, 0.5484057647462168, 0.4242517692623965], 'METEOR': 0.30373537113375393, 'ROUGE': 0.6095928462919656, 'CIDEr': 1.3916730354459894} Epoch 10: test scores: {'BLEU': [0.8371512250161982, 0.6912662532549091, 0.5494338242292505, 0.42842961194165047], 'METEOR': 0.3034593171069, 'ROUGE': 0.6074838195075987, 'CIDEr': 1.3929439731290956} Epoch 11: valid scores: {'BLEU': [0.8338745045639491, 0.6909412333836084, 0.5478022366005478, 0.4248563841323607], 'METEOR': 0.29982710000784657, 'ROUGE': 0.6090442845413084, 'CIDEr': 1.380871963690942} Epoch 11: test scores: {'BLEU': [0.8320881645925845, 0.6897865566785281, 0.5497962995998547, 0.4296512120498058], 'METEOR': 0.3001551836167212, 'ROUGE': 0.6095729905385621, 'CIDEr': 1.395327814840547} Epoch 12: valid scores: {'BLEU': [0.8335816089449272, 0.691955562290691, 0.5497808338419828, 0.42712092062154966], 'METEOR': 0.2995854172105309, 'ROUGE': 0.6092121180620336, 'CIDEr': 1.3841083245164836} Epoch 12: test scores: {'BLEU': [0.8358434910271985, 0.6927899360982215, 0.5515334705278486, 0.42963927851520434], 'METEOR': 0.30060352023996734, 'ROUGE': 0.6085176857192084, 'CIDEr': 1.3970138559169973} Epoch 13: test scores: {'BLEU': [0.8368407870282764, 0.6927053589523655, 0.5503410971040265, 0.42778248808076114], 'METEOR': 0.30329596254764474, 'ROUGE': 0.6091348965367512, 'CIDEr': 1.3911346327574417} Epoch 13: valid scores: {'BLEU': [0.8375819786797934, 0.6928144827887621, 0.5485425470135012, 0.42494904660063526], 'METEOR': 0.3015641520541005, 'ROUGE': 0.6094370438760679, 'CIDEr': 1.3870016917338366} Epoch 14: valid scores: {'BLEU': [0.8397301508998447, 0.6951166862174084, 0.5501077595020255, 0.425305595268546], 'METEOR': 0.301307044864702, 'ROUGE': 0.6092071185407049, 'CIDEr': 1.3888033401841} Epoch 14: test scores: {'BLEU': [0.8387347059300678, 0.6938256746185609, 0.5502301568894592, 0.42739577539223783], 'METEOR': 0.3022847066751842, 'ROUGE': 0.6099642379639814, 'CIDEr': 1.3938938737463365} Epoch 15: valid scores: {'BLEU': [0.8383349351062339, 0.6939514222346885, 0.5496104450256393, 0.4256538746739609], 'METEOR': 0.30113020113868766, 'ROUGE': 0.608930887381079, 'CIDEr': 1.3881121994480983} Epoch 15: test scores: {'BLEU': [0.8377434526189247, 0.6938947082894922, 0.5520702255956362, 0.42985101765724304], 'METEOR': 0.30209385339950307, 'ROUGE': 0.6087747503180939, 'CIDEr': 1.3930224785420147} Epoch 16: valid scores: {'BLEU': [0.8404747079306697, 0.6945144474530267, 0.5491480576845459, 0.42432169621450344], 'METEOR': 0.3011667231732114, 'ROUGE': 0.6087723165936284, 'CIDEr': 1.3913707334426995} Epoch 16: test scores: {'BLEU': [0.8403868539331363, 0.6947791828047277, 0.5526706479845518, 0.43114776979811276], 'METEOR': 0.3035610412601643, 'ROUGE': 0.6104259661927642, 'CIDEr': 1.4017157368840588} Epoch 17: valid scores: {'BLEU': [0.8370814506152257, 0.6941824464543063, 0.5506641646309467, 0.426903866381421], 'METEOR': 0.3009911711830235, 'ROUGE': 0.6099568995822994, 'CIDEr': 1.3880645221570864} Epoch 17: test scores: {'BLEU': [0.8345497554622181, 0.6922240764550124, 0.5512615294054465, 0.42992375836248503], 'METEOR': 0.3029906087443316, 'ROUGE': 0.6108695148820223, 'CIDEr': 1.395373498283653} Epoch 18: valid scores: {'BLEU': [0.8342504349698247, 0.6909111557999621, 0.5480946427393756, 0.42480807614822885], 'METEOR': 0.298903291438397, 'ROUGE': 0.6079175955401886, 'CIDEr': 1.3779367477191458} Epoch 18: test scores: {'BLEU': [0.8322245235372027, 0.6891459224250824, 0.5479176668734069, 0.427081860511104], 'METEOR': 0.29924427125185954, 'ROUGE': 0.608153175395876, 'CIDEr': 1.3867699582117679} Epoch 19: valid scores: {'BLEU': [0.8375084420728919, 0.6947617266115735, 0.5507303660393632, 0.42665732219177516], 'METEOR': 0.300775230770062, 'ROUGE': 0.6091563180552525, 'CIDEr': 1.3913689397315665} Epoch 19: test scores: {'BLEU': [0.8339521842831992, 0.6910585392547325, 0.5497944114216822, 0.4271993533383748], 'METEOR': 0.30110834748349546, 'ROUGE': 0.6090249755582172, 'CIDEr': 1.3956418029412469} ```
davidnvq commented 2 years ago

Extract the HDF5 file and Training with 1 GPU (24GB)

if fail to transfer large file, could you send me HDF5 file hash value generate by md5sum

Debug with the validation split

b84c2fa1e5b1827fea8739f8e7d4f3d9 all_splits.h5


To reproduce the `all_splits.h5`, please do the following:
* download the [Karpathy valid_ids.json](https://drive.google.com/file/d/1ifNLwJmGp6JSARIHYpvXq4ezKFuBUGND/view?usp=sharing)
* edit the `tools/extract_features.py` by the `__init__` method as follows:

```python
class ExtractDataset(Dataset):

    def __init__(self, root, transform=None):
        self.root = root
        self.transform = transform

        self.img_paths = glob(os.path.join(self.root, "train2014/*"))
        self.img_paths += glob(os.path.join(self.root, "val2014/*"))  # Karpathy val/test in val2014 dir

        # todo: Debug
        with open('/home/quang/workspace/grit-release/grit/valid_ids.json', 'r') as f:
            import json
            self.img_ids = sorted(list(set(json.load(f))))
        self.img_paths = [path for path in self.img_paths if int(path.split('/')[-1].split('.')[0].split('_')[-1]) in self.img_ids]
        # todo: end Debug
        # self.img_ids = sorted([int(p.split('/')[-1].split('.')[0].split('_')[-1]) for p in self.img_paths])

        self.img_id2idx = {img_id: img_idx for img_idx, img_id in enumerate(self.img_ids)}

Training script & Config file (1 GPU only)

Script to Extract and Training (1 GPU only) ```bash export DATA_ROOT=/home/quang/datasets/coco_caption python train_caption.py \ exp.name=freeze \ exp.ngpus_per_node=1 \ exp.world_size=1 \ optimizer.batch_size=8 \ optimizer.num_workers=1 \ optimizer.freezing_xe_epochs=10 \ optimizer.freezing_sc_epochs=10 \ optimizer.finetune_xe_epochs=0 \ optimizer.finetune_sc_epochs=0 \ model.detector.checkpoint=/home/quang/checkpoints/ecaptioner/detector_checkpoint_4ds.pth ```
Config file (1 GPU only) ```yaml exp: seed: 42 name: freeze rank: 0 ngpus_per_node: 1 world_size: 1 checkpoint: '' eval: false resume: false dataset: overfit: false ann_root: ${oc.env:DATA_ROOT}/annotations img_root: ${oc.env:DATA_ROOT} hdf5_path: ${oc.env:DATA_ROOT}/all_splits.h5 vocab_path: ${oc.env:DATA_ROOT}/annotations/vocab.json use_gri_feat: ${model.use_gri_feat} use_reg_feat: ${model.use_reg_feat} transform_cfg: size: - 384 - 640 resize_name: maxwh randaug: true model: use_gri_feat: true use_reg_feat: true grid_feat_dim: 1024 frozen_stages: 2 beam_size: 5 beam_len: 20 dropout: 0.2 attn_dropout: 0.2 vocab_size: 10201 max_len: 54 pad_idx: 1 bos_idx: 2 eos_idx: 3 d_model: 512 n_heads: 8 grit_net: n_memories: 1 n_layers: 3 detector: checkpoint: /home/quang/checkpoints/ecaptioner/detector_checkpoint_4ds.pth d_model: 512 dim_feedforward: 1024 num_heads: 8 num_layers: 6 num_levels: 4 num_points: 4 num_queries: 150 num_classes: 1849 dropout: 0.1 activation: relu return_intermediate: true with_box_refine: true cap_generator: decoder_name: parallel n_layers: 3 activation: sigmoid optimizer: warmup_init_lr: 1.0e-05 min_lr: 0.0001 xe_lr: 0.0001 sc_lr: 5.0e-06 xe_backbone_lr: 1.0e-05 sc_backbone_lr: 5.0e-06 weight_decay: 0.01 beta_1: 0.9 beta_2: 0.99 batch_size: 8 num_workers: 1 freezing_xe_epochs: 10 freezing_sc_epochs: 10 finetune_xe_epochs: 0 finetune_sc_epochs: 0 ```

Extracting/ Training time

Training log (1 GPU only)

Note that: I didn't modify the accumulation steps. I used the original code on Github.

cat result.txt
Epoch 0: valid scores: {'BLEU': [0.5657088356039734, 0.3765164600749338, 0.23763109601938864, 0.14058035783115716], 'METEOR': 0.1500147030874878, 'ROUGE': 0.43125568340534154, 'CIDEr': 0.31340782943194895}
Epoch 1: valid scores: {'BLEU': [0.6878828623627243, 0.507316006697793, 0.3537663973826863, 0.2391762423466745], 'METEOR': 0.206685595887185, 'ROUGE': 0.49484805064312143, 'CIDEr': 0.6757494233996244}
Epoch 2: valid scores: {'BLEU': [0.7116153894570173, 0.5471644619852094, 0.4009791335323821, 0.2848343013717071], 'METEOR': 0.2305761158011154, 'ROUGE': 0.5255454390585639, 'CIDEr': 0.8224964949445229}
Epoch 3: valid scores: {'BLEU': [0.7291438793084305, 0.5736018163449463, 0.4326126143053252, 0.31790383996538485], 'METEOR': 0.24522387934338238, 'ROUGE': 0.5372571717758927, 'CIDEr': 0.9512040868143715}
Epoch 4: valid scores: {'BLEU': [0.7455173215842724, 0.5968684444062177, 0.45936638549841136, 0.3467431181064699], 'METEOR': 0.2581015120868733, 'ROUGE': 0.5540655829406045, 'CIDEr': 1.023898960504308}
Epoch 5: valid scores: {'BLEU': [0.7740523856047625, 0.6268571425552633, 0.486731428915857, 0.370990697219715], 'METEOR': 0.2694790703844578, 'ROUGE': 0.574600817208342, 'CIDEr': 1.1328910043967173}
Epoch 6: valid scores: {'BLEU': [0.7854875573119067, 0.6463303116953236, 0.5112994988008267, 0.39755753465740196], 'METEOR': 0.2783247030117692, 'ROUGE': 0.5866351406112981, 'CIDEr': 1.1825715989969285}
Epoch 7: valid scores: {'BLEU': [0.7967639161320746, 0.660413746055181, 0.5267351201653497, 0.4123261557944266], 'METEOR': 0.2871017519203519, 'ROUGE': 0.5956808751472822, 'CIDEr': 1.2401164621648924}
Epoch 8: valid scores: {'BLEU': [0.8188760411261381, 0.6848406618995978, 0.5536121828146608, 0.4405513061884243], 'METEOR': 0.29947558009668784, 'ROUGE': 0.6140204703823021, 'CIDEr': 1.3457340020795625}
Epoch 9: valid scores: {'BLEU': [0.8229051857849183, 0.698569281616585, 0.5741022287173227, 0.464374563524836], 'METEOR': 0.3051841693153665, 'ROUGE': 0.6225127305027678, 'CIDEr': 1.3906497837842993}
Epoch 10: valid scores: {'BLEU': [0.8485463629214782, 0.7254831885821973, 0.59986320580788, 0.487085113233545], 'METEOR': 0.3134723799414034, 'ROUGE': 0.6388800264709864, 'CIDEr': 1.4849262857151433}
Epoch 11: valid scores: {'BLEU': [0.8590772182499388, 0.7388936199470936, 0.6129033386973999, 0.49993311076285024], 'METEOR': 0.31902620000800086, 'ROUGE': 0.6459180512724564, 'CIDEr': 1.5328290754094833}
Epoch 12: valid scores: {'BLEU': [0.8581934967437549, 0.7402937380796337, 0.6158436886567139, 0.5033578525900554], 'METEOR': 0.3197130362024978, 'ROUGE': 0.6470378530459482, 'CIDEr': 1.5412842549345602}
Epoch 13: valid scores: {'BLEU': [0.8572692489580925, 0.7438561315765374, 0.6204644953014118, 0.5076103842470351], 'METEOR': 0.31826740807714515, 'ROUGE': 0.6511414614798079, 'CIDEr': 1.5397557123000631}
Epoch 14: valid scores: {'BLEU': [0.8585203822951307, 0.7437264684342925, 0.6198837148657005, 0.507023894683179], 'METEOR': 0.31845697977260934, 'ROUGE': 0.6503986025693129, 'CIDEr': 1.5427823322382386}
Epoch 15: valid scores: {'BLEU': [0.8671184807479998, 0.7514373121962713, 0.6265955653527062, 0.51250561750678], 'METEOR': 0.3222642006196188, 'ROUGE': 0.6524141117034383, 'CIDEr': 1.5682332594756219}
Epoch 16: valid scores: {'BLEU': [0.8574281971626537, 0.7442066656978363, 0.621136048254748, 0.5095942829441872], 'METEOR': 0.31900815933209364, 'ROUGE': 0.6511932762520403, 'CIDEr': 1.5569648704835397}
Epoch 17: valid scores: {'BLEU': [0.8652412554541837, 0.7506375086442764, 0.6265413835210994, 0.5136194234609738], 'METEOR': 0.3209289611849057, 'ROUGE': 0.6535225115756486, 'CIDEr': 1.5711424920966477}
Epoch 18: valid scores: {'BLEU': [0.869637776382788, 0.756320418695215, 0.6327974567774621, 0.5193546276450925], 'METEOR': 0.32367947931713, 'ROUGE': 0.6555057536848133, 'CIDEr': 1.5895846425122862}
Epoch 19: valid scores: {'BLEU': [0.8711753249445947, 0.7585302412516803, 0.6358090271624591, 0.5231909801506858], 'METEOR': 0.3254346323345664, 'ROUGE': 0.6578219566767588, 'CIDEr': 1.603275355954415}
csv result.csv
exp      backbone   imsize    resize   raug   epoch   split   cider    B1      B4      R       M       B2      B3      t-loss   t-reward   b-reward   which   v-loss
freeze   B-VG       384_640   maxwh    True   0       valid   31.34    56.57   14.06   43.13   15.00   37.65   23.76   5.31     0.00       0.00       fr_xe   0.00
freeze   B-VG       384_640   maxwh    True   1       valid   67.57    68.79   23.92   49.48   20.67   50.73   35.38   3.68     0.00       0.00       fr_xe   0.00
freeze   B-VG       384_640   maxwh    True   2       valid   82.25    71.16   28.48   52.55   23.06   54.72   40.10   3.22     0.00       0.00       fr_xe   0.00
freeze   B-VG       384_640   maxwh    True   3       valid   95.12    72.91   31.79   53.73   24.52   57.36   43.26   2.96     0.00       0.00       fr_xe   0.00
freeze   B-VG       384_640   maxwh    True   4       valid   102.39   74.55   34.67   55.41   25.81   59.69   45.94   2.77     0.00       0.00       fr_xe   0.00
freeze   B-VG       384_640   maxwh    True   5       valid   113.29   77.41   37.10   57.46   26.95   62.69   48.67   2.62     0.00       0.00       fr_xe   0.00
freeze   B-VG       384_640   maxwh    True   6       valid   118.26   78.55   39.76   58.66   27.83   64.63   51.13   2.49     0.00       0.00       fr_xe   0.00
freeze   B-VG       384_640   maxwh    True   7       valid   124.01   79.68   41.23   59.57   28.71   66.04   52.67   2.37     0.00       0.00       fr_xe   0.00
freeze   B-VG       384_640   maxwh    True   8       valid   134.57   81.89   44.06   61.40   29.95   68.48   55.36   2.26     0.00       0.00       fr_xe   0.00
freeze   B-VG       384_640   maxwh    True   9       valid   139.06   82.29   46.44   62.25   30.52   69.86   57.41   2.16     0.00       0.00       fr_xe   0.00
freeze   B-VG       384_640   maxwh    True   10      valid   148.49   84.85   48.71   63.89   31.35   72.55   59.99   -0.00    1.23       1.23       fr_sc   1.84
freeze   B-VG       384_640   maxwh    True   11      valid   153.28   85.91   49.99   64.59   31.90   73.89   61.29   -0.00    1.27       1.27       fr_sc   1.87
freeze   B-VG       384_640   maxwh    True   12      valid   154.13   85.82   50.34   64.70   31.97   74.03   61.58   -0.00    1.30       1.30       fr_sc   2.05
freeze   B-VG       384_640   maxwh    True   13      valid   153.98   85.73   50.76   65.11   31.83   74.39   62.05   -0.04    1.25       1.25       fr_sc   2.68
freeze   B-VG       384_640   maxwh    True   14      valid   154.28   85.85   50.70   65.04   31.85   74.37   61.99   -0.03    1.27       1.27       fr_sc   2.51
freeze   B-VG       384_640   maxwh    True   15      valid   156.82   86.71   51.25   65.24   32.23   75.14   62.66   -0.02    1.29       1.29       fr_sc   2.42
freeze   B-VG       384_640   maxwh    True   16      valid   155.70   85.74   50.96   65.12   31.90   74.42   62.11   -0.02    1.31       1.31       fr_sc   3.16
freeze   B-VG       384_640   maxwh    True   17      valid   157.11   86.52   51.36   65.35   32.09   75.06   62.65   -0.03    1.30       1.30       fr_sc   2.75
freeze   B-VG       384_640   maxwh    True   18      valid   158.96   86.96   51.94   65.55   32.37   75.63   63.28   -0.02    1.32       1.32       fr_sc   2.64
freeze   B-VG       384_640   maxwh    True   19      valid   160.33   87.12   52.32   65.78   32.54   75.85   63.58   -0.02    1.34       1.34       fr_sc   2.76

It seems to me that there is no problem with the current source code, even with the single GPU. CIDEr can reach to 160.33. (training /validation on the Karpathy validation split). It would be very appreciated if you can provide the training log after following me here. It takes less than 1 hour on a single GPU. If you change the source code significantly, it is tough for me to help you debug.

verigle commented 2 years ago

if I use valid split to training, I can get similar result

Epoch 0: valid scores: {'BLEU': [0.5687789326060172, 0.3796406079675871, 0.23717092707212717, 0.13733905909161176], 'METEOR': 0.1500073983083997, 'ROUGE': 0.43221476419522675, 'CIDEr': 0.3128949075262345}
Epoch 0: test scores: {'BLEU': [0.5687789326060172, 0.3796406079675871, 0.23717092707212717, 0.13733905909161176], 'METEOR': 0.1500073983083997, 'ROUGE': 0.43221476419522675, 'CIDEr': 0.3128949075262345}
Epoch 1: valid scores: {'BLEU': [0.6889958805211169, 0.5121146849155161, 0.36027058140443585, 0.24508810545311874], 'METEOR': 0.21027341923151305, 'ROUGE': 0.5002532994737392, 'CIDEr': 0.6866524236568107}
Epoch 1: test scores: {'BLEU': [0.6889958805211169, 0.5121146849155161, 0.36027058140443585, 0.24508810545311874], 'METEOR': 0.21027341923151305, 'ROUGE': 0.5002532994737392, 'CIDEr': 0.6866524236568107}
Epoch 2: valid scores: {'BLEU': [0.7183330378092305, 0.5524295346825346, 0.40531161926311826, 0.2873981549858266], 'METEOR': 0.2314796558551012, 'ROUGE': 0.5283776641294129, 'CIDEr': 0.8358401360436805}
Epoch 2: test scores: {'BLEU': [0.7183330378092305, 0.5524295346825346, 0.40531161926311826, 0.2873981549858266], 'METEOR': 0.2314796558551012, 'ROUGE': 0.5283776641294129, 'CIDEr': 0.8358401360436805}
Epoch 3: valid scores: {'BLEU': [0.730484167598345, 0.5737784051449117, 0.4327363727472917, 0.3182772550693022], 'METEOR': 0.2445401779177077, 'ROUGE': 0.5376507685684517, 'CIDEr': 0.9463425913086506}
Epoch 3: test scores: {'BLEU': [0.730484167598345, 0.5737784051449117, 0.4327363727472917, 0.3182772550693022], 'METEOR': 0.2445401779177077, 'ROUGE': 0.5376507685684517, 'CIDEr': 0.9463425913086506}
Epoch 4: valid scores: {'BLEU': [0.7505353823954549, 0.6036523484214542, 0.4656956280826836, 0.3521943545377037], 'METEOR': 0.26165353239056044, 'ROUGE': 0.5582484641689062, 'CIDEr': 1.0533139044414448}
Epoch 4: test scores: {'BLEU': [0.7505353823954549, 0.6036523484214542, 0.4656956280826836, 0.3521943545377037], 'METEOR': 0.26165353239056044, 'ROUGE': 0.5582484641689062, 'CIDEr': 1.0533139044414448}
Epoch 5: valid scores: {'BLEU': [0.7733759867367139, 0.6275267821691692, 0.48885982589267835, 0.37313309282620377], 'METEOR': 0.2688311206007216, 'ROUGE': 0.5730128008692121, 'CIDEr': 1.1316999222804602}
Epoch 5: test scores: {'BLEU': [0.7733759867367139, 0.6275267821691692, 0.48885982589267835, 0.37313309282620377], 'METEOR': 0.2688311206007216, 'ROUGE': 0.5730128008692121, 'CIDEr': 1.1316999222804602}
Epoch 6: valid scores: {'BLEU': [0.7807381482633197, 0.6425044929000873, 0.5070748840318772, 0.39347385241494864], 'METEOR': 0.2768787330578721, 'ROUGE': 0.5825520557180476, 'CIDEr': 1.1806787334307807}
Epoch 6: test scores: {'BLEU': [0.7807381482633197, 0.6425044929000873, 0.5070748840318772, 0.39347385241494864], 'METEOR': 0.2768787330578721, 'ROUGE': 0.5825520557180476, 'CIDEr': 1.1806787334307807}
Epoch 7: valid scores: {'BLEU': [0.802093149525771, 0.6665120803978738, 0.5337337096878841, 0.4184087706250956], 'METEOR': 0.2905181570765421, 'ROUGE': 0.6008916406572897, 'CIDEr': 1.2661520097509487}
Epoch 7: test scores: {'BLEU': [0.802093149525771, 0.6665120803978738, 0.5337337096878841, 0.4184087706250956], 'METEOR': 0.2905181570765421, 'ROUGE': 0.6008916406572897, 'CIDEr': 1.2661520097509487}
Epoch 8: valid scores: {'BLEU': [0.8218094052582456, 0.6908660330277934, 0.5596073651322367, 0.4461353967172844], 'METEOR': 0.3011541917810352, 'ROUGE': 0.61724770258316, 'CIDEr': 1.355053666243867}
Epoch 8: test scores: {'BLEU': [0.8218094052582456, 0.6908660330277934, 0.5596073651322367, 0.4461353967172844], 'METEOR': 0.3011541917810352, 'ROUGE': 0.61724770258316, 'CIDEr': 1.355053666243867}
Epoch 9: valid scores: {'BLEU': [0.8247992902408987, 0.7003553754884476, 0.5756500780396048, 0.4655344690229966], 'METEOR': 0.3065194333705672, 'ROUGE': 0.6239367525962805, 'CIDEr': 1.3936491532134685}
Epoch 9: test scores: {'BLEU': [0.8247992902408987, 0.7003553754884476, 0.5756500780396048, 0.4655344690229966], 'METEOR': 0.3065194333705672, 'ROUGE': 0.6239367525962805, 'CIDEr': 1.3936491532134685}
Epoch 10: valid scores: {'BLEU': [0.8469671946788128, 0.7256979491702737, 0.5984019360011142, 0.4862304052421898], 'METEOR': 0.31512598647893414, 'ROUGE': 0.6404851326296678, 'CIDEr': 1.4857964915667212}
Epoch 10: test scores: {'BLEU': [0.8469671946788128, 0.7256979491702737, 0.5984019360011142, 0.4862304052421898], 'METEOR': 0.31512598647893414, 'ROUGE': 0.6404851326296678, 'CIDEr': 1.4857964915667212}
Epoch 11: valid scores: {'BLEU': [0.8561725751178904, 0.7370545906234052, 0.6105766843904681, 0.4974683251696527], 'METEOR': 0.3195808154717882, 'ROUGE': 0.6456769941902586, 'CIDEr': 1.5284118465079657}
Epoch 11: test scores: {'BLEU': [0.8561725751178904, 0.7370545906234052, 0.6105766843904681, 0.4974683251696527], 'METEOR': 0.3195808154717882, 'ROUGE': 0.6456769941902586, 'CIDEr': 1.5284118465079657}
Epoch 12: valid scores: {'BLEU': [0.8610795210777592, 0.742200877214234, 0.6158159462213719, 0.5023615419220756], 'METEOR': 0.3217058793205501, 'ROUGE': 0.6497790029211162, 'CIDEr': 1.547431207697141}
Epoch 12: test scores: {'BLEU': [0.8610795210777592, 0.742200877214234, 0.6158159462213719, 0.5023615419220756], 'METEOR': 0.3217058793205501, 'ROUGE': 0.6497790029211162, 'CIDEr': 1.547431207697141}
Epoch 13: valid scores: {'BLEU': [0.851860357852486, 0.7389788243838772, 0.6162398427360142, 0.5047311726990933], 'METEOR': 0.31851234520870625, 'ROUGE': 0.6509809187507277, 'CIDEr': 1.5394116821611177}
Epoch 13: test scores: {'BLEU': [0.851860357852486, 0.7389788243838772, 0.6162398427360142, 0.5047311726990933], 'METEOR': 0.31851234520870625, 'ROUGE': 0.6509809187507277, 'CIDEr': 1.5394116821611177}
Epoch 14: valid scores: {'BLEU': [0.8231126010024336, 0.7071827519392028, 0.5837683430954949, 0.4741675267555221], 'METEOR': 0.3086034024927641, 'ROUGE': 0.6356974942997277, 'CIDEr': 1.456967811730921}
Epoch 14: test scores: {'BLEU': [0.8231126010024336, 0.7071827519392028, 0.5837683430954949, 0.4741675267555221], 'METEOR': 0.3086034024927641, 'ROUGE': 0.6356974942997277, 'CIDEr': 1.456967811730921}
Epoch 15: valid scores: {'BLEU': [0.8272863990896444, 0.7116000059968411, 0.5887282555237652, 0.47874669094252137], 'METEOR': 0.31066910083282245, 'ROUGE': 0.6377276600047493, 'CIDEr': 1.4698738800741153}
Epoch 15: test scores: {'BLEU': [0.8272863990896444, 0.7116000059968411, 0.5887282555237652, 0.47874669094252137], 'METEOR': 0.31066910083282245, 'ROUGE': 0.6377276600047493, 'CIDEr': 1.4698738800741153}
Epoch 16: valid scores: {'BLEU': [0.8080431203809378, 0.6956999316191792, 0.5757002446392038, 0.46745054002440145], 'METEOR': 0.3044121401844127, 'ROUGE': 0.6321279343440936, 'CIDEr': 1.4290889956961352}
Epoch 16: test scores: {'BLEU': [0.8080431203809378, 0.6956999316191792, 0.5757002446392038, 0.46745054002440145], 'METEOR': 0.3044121401844127, 'ROUGE': 0.6321279343440936, 'CIDEr': 1.4290889956961352}
Epoch 17: valid scores: {'BLEU': [0.847436188223204, 0.7343033438484987, 0.6110966062707567, 0.4988794022981793], 'METEOR': 0.31640488939910294, 'ROUGE': 0.6490145207253141, 'CIDEr': 1.5249956031886516}
Epoch 17: test scores: {'BLEU': [0.847436188223204, 0.7343033438484987, 0.6110966062707567, 0.4988794022981793], 'METEOR': 0.31640488939910294, 'ROUGE': 0.6490145207253141, 'CIDEr': 1.5249956031886516}
Epoch 18: valid scores: {'BLEU': [0.8475661053521713, 0.7343827320295427, 0.6104493278300098, 0.497526960951448], 'METEOR': 0.31708557803815257, 'ROUGE': 0.6477755350375113, 'CIDEr': 1.5248696342478407}
Epoch 18: test scores: {'BLEU': [0.8475661053521713, 0.7343827320295427, 0.6104493278300098, 0.497526960951448], 'METEOR': 0.31708557803815257, 'ROUGE': 0.6477755350375113, 'CIDEr': 1.5248696342478407}
Epoch 19: valid scores: {'BLEU': [0.8194458571162563, 0.7070805782264575, 0.5862704437247412, 0.4771818742660139], 'METEOR': 0.3079265166105341, 'ROUGE': 0.6380529879644203, 'CIDEr': 1.4643754640989104}
Epoch 19: test scores: {'BLEU': [0.8194458571162563, 0.7070805782264575, 0.5862704437247412, 0.4771818742660139], 'METEOR': 0.3079265166105341, 'ROUGE': 0.6380529879644203, 'CIDEr': 1.4643754640989104}
exp, backbone, imsize, resize, raug, epoch, split, cider, B1, B4, R, M, B2, B3, t-loss, t-reward, b-reward, which, v-loss
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 0, valid, 31.29, 56.88, 13.73, 43.22, 15.00, 37.96, 23.72, 5.31, 0.00, 0.00, fr_xe, 3.98
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 0, test , 31.29, 56.88, 13.73, 43.22, 15.00, 37.96, 23.72, 5.31, 0.00, 0.00, fr_xe, 3.98
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 1, valid, 68.67, 68.90, 24.51, 50.03, 21.03, 51.21, 36.03, 3.68, 0.00, 0.00, fr_xe, 3.26
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 1, test , 68.67, 68.90, 24.51, 50.03, 21.03, 51.21, 36.03, 3.68, 0.00, 0.00, fr_xe, 3.26
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 2, valid, 83.58, 71.83, 28.74, 52.84, 23.15, 55.24, 40.53, 3.22, 0.00, 0.00, fr_xe, 2.93
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 2, test , 83.58, 71.83, 28.74, 52.84, 23.15, 55.24, 40.53, 3.22, 0.00, 0.00, fr_xe, 2.93
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 3, valid, 94.63, 73.05, 31.83, 53.77, 24.45, 57.38, 43.27, 2.95, 0.00, 0.00, fr_xe, 2.70
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 3, test , 94.63, 73.05, 31.83, 53.77, 24.45, 57.38, 43.27, 2.95, 0.00, 0.00, fr_xe, 2.70
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 4, valid, 105.33, 75.05, 35.22, 55.82, 26.17, 60.37, 46.57, 2.76, 0.00, 0.00, fr_xe, 2.51
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 4, test , 105.33, 75.05, 35.22, 55.82, 26.17, 60.37, 46.57, 2.76, 0.00, 0.00, fr_xe, 2.51
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 5, valid, 113.17, 77.34, 37.31, 57.30, 26.88, 62.75, 48.89, 2.61, 0.00, 0.00, fr_xe, 2.35
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 5, test , 113.17, 77.34, 37.31, 57.30, 26.88, 62.75, 48.89, 2.61, 0.00, 0.00, fr_xe, 2.35
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 6, valid, 118.07, 78.07, 39.35, 58.26, 27.69, 64.25, 50.71, 2.48, 0.00, 0.00, fr_xe, 2.22
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 6, test , 118.07, 78.07, 39.35, 58.26, 27.69, 64.25, 50.71, 2.48, 0.00, 0.00, fr_xe, 2.22
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 7, valid, 126.62, 80.21, 41.84, 60.09, 29.05, 66.65, 53.37, 2.37, 0.00, 0.00, fr_xe, 2.08
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 7, test , 126.62, 80.21, 41.84, 60.09, 29.05, 66.65, 53.37, 2.37, 0.00, 0.00, fr_xe, 2.08
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 8, valid, 135.51, 82.18, 44.61, 61.72, 30.12, 69.09, 55.96, 2.26, 0.00, 0.00, fr_xe, 1.95
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 8, test , 135.51, 82.18, 44.61, 61.72, 30.12, 69.09, 55.96, 2.26, 0.00, 0.00, fr_xe, 1.95
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 9, valid, 139.36, 82.48, 46.55, 62.39, 30.65, 70.04, 57.57, 2.16, 0.00, 0.00, fr_xe, 1.84
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 9, test , 139.36, 82.48, 46.55, 62.39, 30.65, 70.04, 57.57, 2.16, 0.00, 0.00, fr_xe, 1.84
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 10, valid, 148.58, 84.70, 48.62, 64.05, 31.51, 72.57, 59.84, -0.00, 1.24, 1.24, fr_sc, 1.84
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 10, test , 148.58, 84.70, 48.62, 64.05, 31.51, 72.57, 59.84, -0.00, 1.24, 1.24, fr_sc, 1.84
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 11, valid, 152.84, 85.62, 49.75, 64.57, 31.96, 73.71, 61.06, -0.00, 1.28, 1.28, fr_sc, 1.86
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 11, test , 152.84, 85.62, 49.75, 64.57, 31.96, 73.71, 61.06, -0.00, 1.28, 1.28, fr_sc, 1.86
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 12, valid, 154.74, 86.11, 50.24, 64.98, 32.17, 74.22, 61.58, -0.00, 1.30, 1.30, fr_sc, 1.91
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 12, test , 154.74, 86.11, 50.24, 64.98, 32.17, 74.22, 61.58, -0.00, 1.30, 1.30, fr_sc, 1.91
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 13, valid, 153.94, 85.19, 50.47, 65.10, 31.85, 73.90, 61.62, -0.02, 1.29, 1.29, fr_sc, 2.39
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 13, test , 153.94, 85.19, 50.47, 65.10, 31.85, 73.90, 61.62, -0.02, 1.29, 1.29, fr_sc, 2.39
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 14, valid, 145.70, 82.31, 47.42, 63.57, 30.86, 70.72, 58.38, -0.02, 1.30, 1.30, fr_sc, 3.31
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 14, test , 145.70, 82.31, 47.42, 63.57, 30.86, 70.72, 58.38, -0.02, 1.30, 1.30, fr_sc, 3.31
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 15, valid, 146.99, 82.73, 47.87, 63.77, 31.07, 71.16, 58.87, -0.02, 1.30, 1.30, fr_sc, 3.22
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 15, test , 146.99, 82.73, 47.87, 63.77, 31.07, 71.16, 58.87, -0.02, 1.30, 1.30, fr_sc, 3.22
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 16, valid, 142.91, 80.80, 46.75, 63.21, 30.44, 69.57, 57.57, -0.02, 1.30, 1.30, fr_sc, 3.51
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 16, test , 142.91, 80.80, 46.75, 63.21, 30.44, 69.57, 57.57, -0.02, 1.30, 1.30, fr_sc, 3.51
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 17, valid, 152.50, 84.74, 49.89, 64.90, 31.64, 73.43, 61.11, -0.04, 1.28, 1.28, fr_sc, 2.51
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 17, test , 152.50, 84.74, 49.89, 64.90, 31.64, 73.43, 61.11, -0.04, 1.28, 1.28, fr_sc, 2.51
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 18, valid, 152.49, 84.76, 49.75, 64.78, 31.71, 73.44, 61.04, -0.04, 1.26, 1.26, fr_sc, 2.57
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 18, test , 152.49, 84.76, 49.75, 64.78, 31.71, 73.44, 61.04, -0.04, 1.26, 1.26, fr_sc, 2.57
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 19, valid, 146.44, 81.94, 47.72, 63.81, 30.79, 70.71, 58.63, -0.03, 1.29, 1.29, fr_sc, 3.11
caption_valid_freezing_20220911, B-VG, 384_640, maxwh, True, 19, test , 146.44, 81.94, 47.72, 63.81, 30.79, 70.71, 58.63, -0.03, 1.29, 1.29, fr_sc, 3.11

however , if I used default split which with both train and vilid to training , the CIDEr result is very low:

Epoch 0: valid scores: {'BLEU': [0.4343235618643381, 0.21255867555316624, 0.11829569428026677, 0.06668439250445665], 'METEOR': 0.10458336291061558, 'ROUGE': 0.32905634887109153, 'CIDEr': 0.07562134563149167}
Epoch 0: test scores: {'BLEU': [0.43323060256491536, 0.2097915331670189, 0.11784917344111052, 0.06749544625241395], 'METEOR': 0.10408173833063068, 'ROUGE': 0.32766278281987077, 'CIDEr': 0.07557542307268746}

I upload my modified code to github at dev branch for default split :https://github.com/verigle/grit/tree/dev and valid branch for valid split : https://github.com/verigle/grit/tree/valid

I also want to know where has the code to generate .npy file which in data dir? how to generate .npy file? why load ids list from xxx_ids.npy files ?

verigle commented 2 years ago

True. In your case, you can uncomment this "merging code" and change the path in the config to 0_all_splits.h5. Indeed, I extracted using 8 GPUs and it runs quite fast.

if I rename 0_all_splits.h5 to 'all_splits.h5' may cause some problem of reduce accuracy :

Epoch 0: valid scores: {'BLEU': [0.5110826412723652, 0.30517149428016305, 0.1396939899271543, 0.059002949270605676], 'METEOR': 0.10518435380325059, 'ROUGE': 0.3638304227983913, 'CIDEr': 0.08609427342507327}
Epoch 0: test scores: {'BLEU': [0.5110826412723652, 0.30517149428016305, 0.1396939899271543, 0.059002949270605676], 'METEOR': 0.10518435380325059, 'ROUGE': 0.3638304227983913, 'CIDEr': 0.08609427342507327}
torch.distributed.barrier()
    if rank == 0:
        num_gpus = dist.get_world_size()
        with h5py.File(config.dataset.hdf5_path, 'w') as agg_file:
            L = len(dataloader) * BATCH_SIZE * num_gpus
            agg_file.create_dataset('image_ids', data=dataset.img_ids)
            gri_features = agg_file.create_dataset('gri_feat', (L, fh * fw, C), dtype='float32')
            gri_masks = agg_file.create_dataset('gri_mask', (L, 1, 1, fh * fw), dtype='bool')
            if config.model.use_reg_feat:
                Q = config.model.detector.num_queries
                D = config.model.detector.d_model
                reg_features = agg_file.create_dataset('reg_feat', (L, Q, D), dtype='float32')
                reg_masks = agg_file.create_dataset('reg_mask', (L, 1, 1, Q), dtype='bool')

            for r in range(num_gpus):
                filename = f"{r}_" + os.path.basename(config.dataset.hdf5_path)
                dir_path = os.path.dirname(config.dataset.hdf5_path)
                path = os.path.join(dir_path, filename)

                with h5py.File(path, 'r') as f:
                    tmp_ids_list = f['tmp_ids_list'][:len(f['tmp_ids_list'])]

                    for tmp_idx, tmp_id in tqdm(enumerate(tmp_ids_list), total=len(tmp_ids_list)):
                        img_idx = dataset.img_id2idx[tmp_id]
                        # Add grid features
                        gri_features[img_idx] = f['gri_feat'][tmp_idx]
                        gri_masks[img_idx] = f['gri_mask'][tmp_idx]

                        # Add det features
                        if config.model.use_reg_feat:
                            reg_features[img_idx] = f['reg_feat'][tmp_idx]
                            reg_masks[img_idx] = f['reg_mask'][tmp_idx]

                os.remove(path)
                print(f"rank: {rank} - Delete {path}")
        print(f"Saving all to HDF5 file: {config.dataset.hdf5_path}.")

when runing the loop of for tmp_idx, tmp_id in tqdm(enumerate(tmp_ids_list), total=len(tmp_ids_list)): is very slowly, how to optimize this code?

do you have some advice to speed up merge HDF5 file when runing these code when used both train and valid dataset? :

davidnvq commented 2 years ago

however , if I used default split which with both train and vilid to training , the CIDEr result is very low:

Can you elaborate it in more detail? How many samples of training split? Or better to give the me the config / or training script..

BTW, I am currently able to release the training log with 1 GPU for the entire dataset using this code (up to 12 epochs). It run properly, performance seems acceptable and keeps increasing. However, I think hyperparameters are not optimal for small batch size. It may need the trick of accumulating gradients. The below result is obtained with freezed detector, batch size 8 only. (I will copy this to your previous issue):

exp          backbone   imsize    resize   raug   epoch   split   cider    B1      B4      R       M       B2      B3      t-loss   t-reward   b-reward   which   v-loss
freeze_all   B-VG       384_640   maxwh    True   0       valid   103.42   73.92   33.00   54.45   25.92   57.83   43.80   3.22     0.00       0.00       fr_xe   2.55
freeze_all   B-VG       384_640   maxwh    True   0       test    103.65   73.65   32.66   54.32   25.80   57.44   43.44   3.22     0.00       0.00       fr_xe   2.55
freeze_all   B-VG       384_640   maxwh    True   1       valid   110.98   75.69   34.97   56.14   27.33   59.54   45.66   2.48     0.00       0.00       fr_xe   2.34
freeze_all   B-VG       384_640   maxwh    True   1       test    111.42   75.46   34.36   55.73   27.05   59.05   45.07   2.48     0.00       0.00       fr_xe   2.34
freeze_all   B-VG       384_640   maxwh    True   2       valid   114.31   76.68   35.89   56.95   27.70   60.84   46.82   2.33     0.00       0.00       fr_xe   2.28
freeze_all   B-VG       384_640   maxwh    True   2       test    115.03   76.69   35.86   56.70   27.54   60.63   46.72   2.33     0.00       0.00       fr_xe   2.28
freeze_all   B-VG       384_640   maxwh    True   3       valid   114.20   76.26   35.60   56.56   27.60   60.18   46.29   2.26     0.00       0.00       fr_xe   2.24
freeze_all   B-VG       384_640   maxwh    True   3       test    114.73   76.25   35.21   56.28   27.54   60.02   45.99   2.26     0.00       0.00       fr_xe   2.24
freeze_all   B-VG       384_640   maxwh    True   4       valid   117.32   77.59   36.79   57.31   27.99   61.81   47.80   2.20     0.00       0.00       fr_xe   2.23
freeze_all   B-VG       384_640   maxwh    True   4       test    117.73   77.41   36.76   57.01   27.82   61.64   47.78   2.20     0.00       0.00       fr_xe   2.23
freeze_all   B-VG       384_640   maxwh    True   5       valid   117.43   77.49   36.77   57.24   28.20   61.64   47.71   2.16     0.00       0.00       fr_xe   2.21
freeze_all   B-VG       384_640   maxwh    True   5       test    118.11   77.09   36.63   57.14   27.95   61.27   47.50   2.16     0.00       0.00       fr_xe   2.21
freeze_all   B-VG       384_640   maxwh    True   6       valid   115.45   76.81   35.97   56.86   27.98   60.42   46.56   2.13     0.00       0.00       fr_xe   2.19
freeze_all   B-VG       384_640   maxwh    True   6       test    117.25   76.84   36.31   56.96   28.04   60.72   46.97   2.13     0.00       0.00       fr_xe   2.19
freeze_all   B-VG       384_640   maxwh    True   7       valid   118.88   77.54   37.07   57.39   28.18   61.76   47.91   2.11     0.00       0.00       fr_xe   2.21
freeze_all   B-VG       384_640   maxwh    True   7       test    119.77   77.48   37.07   57.38   28.19   61.69   47.84   2.11     0.00       0.00       fr_xe   2.21
freeze_all   B-VG       384_640   maxwh    True   8       valid   117.11   77.07   36.95   57.09   28.17   61.34   47.62   2.09     0.00       0.00       fr_xe   2.20
freeze_all   B-VG       384_640   maxwh    True   8       test    117.30   76.64   36.31   56.86   27.98   60.83   47.05   2.09     0.00       0.00       fr_xe   2.20
freeze_all   B-VG       384_640   maxwh    True   9       valid   118.32   77.47   37.18   57.35   28.17   61.74   47.92   2.06     0.00       0.00       fr_xe   2.20
freeze_all   B-VG       384_640   maxwh    True   9       test    120.29   77.79   37.27   57.52   28.32   61.95   48.06   2.06     0.00       0.00       fr_xe   2.20
freeze_all   B-VG       384_640   maxwh    True   10      valid   129.65   81.61   40.15   59.21   28.90   66.77   52.31   -0.01    1.24       1.24       fr_sc   2.72
freeze_all   B-VG       384_640   maxwh    True   10      test    130.61   81.68   40.26   59.17   28.95   66.71   52.34   -0.01    1.24       1.24       fr_sc   2.72
freeze_all   B-VG       384_640   maxwh    True   11      valid   130.18   81.85   40.29   59.18   29.04   66.87   52.44   -0.02    1.27       1.27       fr_sc   3.55
freeze_all   B-VG       384_640   maxwh    True   11      test    132.26   82.19   40.34   59.26   29.05   67.03   52.55   -0.02    1.27       1.27       fr_sc   3.55
davidnvq commented 2 years ago

if I use valid split to training, I can get similar result

That's great. it seems no problem for code logic. If you can obtain this accuracy for this debugging trial, you can similarly obtain the above result for the entire dataset.

if I rename 0_all_splits.h5 to 'all_splits.h5' may cause some problem of reduce accuracy

Sorry that I didn't say it clearly. Simply modifying the file name may possibly cause the wrong indexing of training examples. You may get a caption of 1 image that is paired with the features of another image. Rather than simply rename the file, please modify the code in https://github.com/davidnvq/grit/blob/main/tools/extract_features.py so that you don't need the merging step. Please check your modification carefully so that the image ids will be paired correctly. That's my advice for speeding up!!! If you are struggling with the implementation, good news is I will be happy to share the all_split.h5 with you via your cloud service. Please email me quang@vision.is.tohoku.ac.jp for this special help ;)

I am now suspecting your modification may lead to wrong mapping :-?

The reason why merging the 0_all_splits.h5 to all_splits.h5 becomes super slow with 1 GPU is already explained in #15 (single index access of large file).

I also want to know where has the code to generate .npy file which in data dir? how to generate .npy file?

(a) Do you mean coco_dev_ids.npy, etc? Or (b) you want to know how to extract features in .npy? If for (a), please see the answer in the below question. If for (b), as discussed in #15, I may implement this method if I am free. But strongly recommend you to do it yourself. This code base is primarily for reproducing training results of same configurations and for inference. Currently, it even works with 1 GPU as in #15!

why load ids list from xxx_ids.npy files ?

I just copied the labels for Karpathy splits from https://github.com/aimagelab/meshed-memory-transformer. Maybe for their preference, please ask them ;)

HHHH17 commented 1 year ago

True. In your case, you can uncomment this "merging code" and change the path in the config to 0_all_splits.h5. Indeed, I extracted using 8 GPUs and it runs quite fast.

if I rename 0_all_splits.h5 to 'all_splits.h5' may cause some problem of reduce accuracy :

Epoch 0: valid scores: {'BLEU': [0.5110826412723652, 0.30517149428016305, 0.1396939899271543, 0.059002949270605676], 'METEOR': 0.10518435380325059, 'ROUGE': 0.3638304227983913, 'CIDEr': 0.08609427342507327}
Epoch 0: test scores: {'BLEU': [0.5110826412723652, 0.30517149428016305, 0.1396939899271543, 0.059002949270605676], 'METEOR': 0.10518435380325059, 'ROUGE': 0.3638304227983913, 'CIDEr': 0.08609427342507327}
torch.distributed.barrier()
    if rank == 0:
        num_gpus = dist.get_world_size()
        with h5py.File(config.dataset.hdf5_path, 'w') as agg_file:
            L = len(dataloader) * BATCH_SIZE * num_gpus
            agg_file.create_dataset('image_ids', data=dataset.img_ids)
            gri_features = agg_file.create_dataset('gri_feat', (L, fh * fw, C), dtype='float32')
            gri_masks = agg_file.create_dataset('gri_mask', (L, 1, 1, fh * fw), dtype='bool')
            if config.model.use_reg_feat:
                Q = config.model.detector.num_queries
                D = config.model.detector.d_model
                reg_features = agg_file.create_dataset('reg_feat', (L, Q, D), dtype='float32')
                reg_masks = agg_file.create_dataset('reg_mask', (L, 1, 1, Q), dtype='bool')

            for r in range(num_gpus):
                filename = f"{r}_" + os.path.basename(config.dataset.hdf5_path)
                dir_path = os.path.dirname(config.dataset.hdf5_path)
                path = os.path.join(dir_path, filename)

                with h5py.File(path, 'r') as f:
                    tmp_ids_list = f['tmp_ids_list'][:len(f['tmp_ids_list'])]

                    for tmp_idx, tmp_id in tqdm(enumerate(tmp_ids_list), total=len(tmp_ids_list)):
                        img_idx = dataset.img_id2idx[tmp_id]
                        # Add grid features
                        gri_features[img_idx] = f['gri_feat'][tmp_idx]
                        gri_masks[img_idx] = f['gri_mask'][tmp_idx]

                        # Add det features
                        if config.model.use_reg_feat:
                            reg_features[img_idx] = f['reg_feat'][tmp_idx]
                            reg_masks[img_idx] = f['reg_mask'][tmp_idx]

                os.remove(path)
                print(f"rank: {rank} - Delete {path}")
        print(f"Saving all to HDF5 file: {config.dataset.hdf5_path}.")

when runing the loop of for tmp_idx, tmp_id in tqdm(enumerate(tmp_ids_list), total=len(tmp_ids_list)): is very slowly, how to optimize this code?

do you have some advice to speed up merge HDF5 file when runing these code when used both train and valid dataset? :

Hi, I met the same problem as you. after 1 epoch for freeze train, the cider only reach 0.17. I wonder if you have solved this problem now?

JingyuLi-code commented 1 year ago

10 epoch log;

Train: rank=0, epoch=0, phase=fr_xe
Epoch 0 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:18<00:00,  5.01it/s, loss=1.25][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:21<00:00,  5.52it/s, loss=1.25]
Epoch 0 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.46it/s, loss=3.97][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.59it/s, loss=3.97]
Epoch 0 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22200s
Epoch 0 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20595s
Epoch 0 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.76it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.16it/s]
Epoch: 0 iters: 157
Total time per 1 batch: 0.20528s
Epoch 0: valid scores: {'BLEU': [0.4669783296661464, 0.24582240555971746, 0.13479587494094583, 0.07239531663650367], 'METEOR': 0.10954924428703441, 'ROUGE': 0.34872282632850365, 'CIDEr': 0.08790172488631927}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 0, valid, 8.79, 46.70, 7.24, 34.87, 10.95, 24.58, 13.48, 1.25, 0.00, 0.00, fr_xe, 3.97
Epoch 0 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21069s
Epoch 0 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20469s
Epoch 0 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 0 iters: 157
Total time per 1 batch: 0.20416s
Epoch 0: test scores: {'BLEU': [0.4664962002208445, 0.24325046136142325, 0.1341976162794273, 0.0742708785109443], 'METEOR': 0.10937207416996544, 'ROUGE': 0.34777669352106455, 'CIDEr': 0.08940798205091863}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 0, test , 8.94, 46.65, 7.43, 34.78, 10.94, 24.33, 13.42, 1.25, 0.00, 0.00, fr_xe, 3.97
Train: rank=0, epoch=1, phase=fr_xe
Epoch 1 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:08<00:00,  6.05it/s, loss=0.921][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.59it/s, loss=0.921]
Epoch 1 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.42it/s, loss=3.37][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.60it/s, loss=3.37]
Epoch 1 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22185s
Epoch 1 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20407s
Epoch 1 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 1 iters: 157
Total time per 1 batch: 0.20349s
Epoch 1: valid scores: {'BLEU': [0.4177599999999917, 0.22081040233144753, 0.13440728245371522, 0.086634145554716], 'METEOR': 0.10917767439359446, 'ROUGE': 0.3293001799815639, 'CIDEr': 0.09119403689594928}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 1, valid, 9.12, 41.78, 8.66, 32.93, 10.92, 22.08, 13.44, 0.92, 0.00, 0.00, fr_xe, 3.37
Epoch 1 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21100s
Epoch 1 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.84it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20332s
Epoch 1 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.85it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 1 iters: 157
Total time per 1 batch: 0.20278s
Epoch 1: test scores: {'BLEU': [0.41875999999999164, 0.22176797684867283, 0.13475045843026873, 0.08727707854272869], 'METEOR': 0.10943320624155432, 'ROUGE': 0.3286083021124142, 'CIDEr': 0.0934703590652789}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 1, test , 9.35, 41.88, 8.73, 32.86, 10.94, 22.18, 13.48, 0.92, 0.00, 0.00, fr_xe, 3.37
Train: rank=0, epoch=2, phase=fr_xe
Epoch 2 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:15<00:00,  5.11it/s, loss=0.815][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:17<00:00,  5.55it/s, loss=0.815]
Epoch 2 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.30it/s, loss=3.11][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.56it/s, loss=3.11]
Epoch 2 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22380s
Epoch 2 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20404s
Epoch 2 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 2 iters: 157
Total time per 1 batch: 0.20375s
Epoch 2: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 2, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.82, 0.00, 0.00, fr_xe, 3.11
Epoch 2 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21104s
Epoch 2 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20410s
Epoch 2 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 2 iters: 157
Total time per 1 batch: 0.20358s
Epoch 2: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 2, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.82, 0.00, 0.00, fr_xe, 3.11
Train: rank=0, epoch=3, phase=fr_xe
Epoch 3 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.80it/s, loss=0.762][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.57it/s, loss=0.762]
Epoch 3 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.60it/s, loss=2.97][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.70it/s, loss=2.97]
Epoch 3 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22389s
Epoch 3 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20444s
Epoch 3 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:36<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 3 iters: 157
Total time per 1 batch: 0.20382s
Epoch 3: valid scores: {'BLEU': [0.3880270749907782, 0.1498102872688645, 0.08511781760022998, 0.050539934594252114], 'METEOR': 0.09991918939322139, 'ROUGE': 0.30912337673077656, 'CIDEr': 0.07096231761736432}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 3, valid, 7.10, 38.80, 5.05, 30.91, 9.99, 14.98, 8.51, 0.76, 0.00, 0.00, fr_xe, 2.97
Epoch 3 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20925s
Epoch 3 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.83it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20409s
Epoch 3 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:36<00:00,  4.84it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.22it/s]
Epoch: 3 iters: 157
Total time per 1 batch: 0.20361s
Epoch 3: test scores: {'BLEU': [0.38801692079215583, 0.15115514556365187, 0.08703201785437005, 0.053794063909602625], 'METEOR': 0.1004634083610197, 'ROUGE': 0.3098636871323689, 'CIDEr': 0.07426506154284174}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 3, test , 7.43, 38.80, 5.38, 30.99, 10.05, 15.12, 8.70, 0.76, 0.00, 0.00, fr_xe, 2.97
Train: rank=0, epoch=4, phase=fr_xe
Epoch 4 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:18<00:00,  4.65it/s, loss=0.729][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:20<00:00,  5.53it/s, loss=0.729]
Epoch 4 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:26<00:00, 12.23it/s, loss=2.88][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.42it/s, loss=2.88]
Epoch 4 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22306s
Epoch 4 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20433s
Epoch 4 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 4 iters: 157
Total time per 1 batch: 0.20391s
Epoch 4: valid scores: {'BLEU': [0.3956675189916203, 0.14959241499951006, 0.08455045357940592, 0.05072552705044864], 'METEOR': 0.10225475159056896, 'ROUGE': 0.3122258530606642, 'CIDEr': 0.07015325932525968}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 4, valid, 7.02, 39.57, 5.07, 31.22, 10.23, 14.96, 8.46, 0.73, 0.00, 0.00, fr_xe, 2.88
Epoch 4 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21257s
Epoch 4 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20408s
Epoch 4 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 4 iters: 157
Total time per 1 batch: 0.20355s
Epoch 4: test scores: {'BLEU': [0.39403577331772227, 0.14921009455837111, 0.083850153056571, 0.050482385367933987], 'METEOR': 0.10182105655673135, 'ROUGE': 0.31149491539952134, 'CIDEr': 0.06971117398735341}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 4, test , 6.97, 39.40, 5.05, 31.15, 10.18, 14.92, 8.39, 0.73, 0.00, 0.00, fr_xe, 2.88
Train: rank=0, epoch=5, phase=fr_xe
Epoch 5 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:12<00:00,  5.54it/s, loss=0.707][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.57it/s, loss=0.707]
Epoch 5 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.46it/s, loss=2.83][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.63it/s, loss=2.83]
Epoch 5 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.23301s
Epoch 5 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20449s
Epoch 5 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 5 iters: 157
Total time per 1 batch: 0.20397s
Epoch 5: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 5, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.71, 0.00, 0.00, fr_xe, 2.83
Epoch 5 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20863s
Epoch 5 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20429s
Epoch 5 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 5 iters: 157
Total time per 1 batch: 0.20373s
Epoch 5: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 5, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.71, 0.00, 0.00, fr_xe, 2.83
Train: rank=0, epoch=6, phase=fr_xe
Epoch 6 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.17it/s, loss=0.69][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:16<00:00,  5.55it/s, loss=0.69]
Epoch 6 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.35it/s, loss=2.79][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.60it/s, loss=2.79]
Epoch 6 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22445s
Epoch 6 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20437s
Epoch 6 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.80it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.18it/s]
Epoch: 6 iters: 157
Total time per 1 batch: 0.20382s
Epoch 6: valid scores: {'BLEU': [0.3626799999999928, 0.18942318055964488, 0.07775216931829154, 0.04351562148082592], 'METEOR': 0.09470731840804052, 'ROUGE': 0.3105374796878281, 'CIDEr': 0.055744577283453285}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 6, valid, 5.57, 36.27, 4.35, 31.05, 9.47, 18.94, 7.78, 0.69, 0.00, 0.00, fr_xe, 2.79
Epoch 6 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21029s
Epoch 6 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20432s
Epoch 6 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 6 iters: 157
Total time per 1 batch: 0.20382s
Epoch 6: test scores: {'BLEU': [0.35903999999999286, 0.18394979025085054, 0.0703169688713343, 0.03666873619238479], 'METEOR': 0.09287212083601691, 'ROUGE': 0.3071016066175093, 'CIDEr': 0.049071662717992064}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 6, test , 4.91, 35.90, 3.67, 30.71, 9.29, 18.39, 7.03, 0.69, 0.00, 0.00, fr_xe, 2.79
Train: rank=0, epoch=7, phase=fr_xe
Epoch 7 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:21<00:00,  5.86it/s, loss=0.677][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:23<00:00,  5.51it/s, loss=0.677]
Epoch 7 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.29it/s, loss=2.77][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.52it/s, loss=2.77]
Epoch 7 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22371s
Epoch 7 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20449s
Epoch 7 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.13it/s]
Epoch: 7 iters: 157
Total time per 1 batch: 0.20394s
Epoch 7: valid scores: {'BLEU': [0.38179999999999237, 0.20104699284826827, 0.08538310838454195, 0.04516644657768871], 'METEOR': 0.10478659426096619, 'ROUGE': 0.3169023526256761, 'CIDEr': 0.07155207847141871}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 7, valid, 7.16, 38.18, 4.52, 31.69, 10.48, 20.10, 8.54, 0.68, 0.00, 0.00, fr_xe, 2.77
Epoch 7 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21468s
Epoch 7 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20399s
Epoch 7 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.16it/s]
Epoch: 7 iters: 157
Total time per 1 batch: 0.20353s
Epoch 7: test scores: {'BLEU': [0.37899999999999245, 0.19519767758180867, 0.0771990834096997, 0.03888923579121348], 'METEOR': 0.10304347878594912, 'ROUGE': 0.3141675146161081, 'CIDEr': 0.06542015344438794}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 7, test , 6.54, 37.90, 3.89, 31.42, 10.30, 19.52, 7.72, 0.68, 0.00, 0.00, fr_xe, 2.77
Train: rank=0, epoch=8, phase=fr_xe
Epoch 8 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:10<00:00,  5.50it/s, loss=0.666][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:13<00:00,  5.58it/s, loss=0.666]
Epoch 8 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.32it/s, loss=2.75][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.69it/s, loss=2.75]
Epoch 8 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22344s
Epoch 8 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20453s
Epoch 8 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.80it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 8 iters: 157
Total time per 1 batch: 0.20401s
Epoch 8: valid scores: {'BLEU': [0.38179999999999237, 0.20104699284826827, 0.08538310838454195, 0.04516644657768871], 'METEOR': 0.10478659426096619, 'ROUGE': 0.3169023526256761, 'CIDEr': 0.07155207847141871}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 8, valid, 7.16, 38.18, 4.52, 31.69, 10.48, 20.10, 8.54, 0.67, 0.00, 0.00, fr_xe, 2.75
Epoch 8 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21202s
Epoch 8 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20418s
Epoch 8 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 8 iters: 157
Total time per 1 batch: 0.20374s
Epoch 8: test scores: {'BLEU': [0.37899999999999245, 0.19519767758180867, 0.0771990834096997, 0.03888923579121348], 'METEOR': 0.10304347878594912, 'ROUGE': 0.3141675146161081, 'CIDEr': 0.06542015344438794}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 8, test , 6.54, 37.90, 3.89, 31.42, 10.30, 19.52, 7.72, 0.67, 0.00, 0.00, fr_xe, 2.75
Train: rank=0, epoch=9, phase=fr_xe
Epoch 9 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.36it/s, loss=0.656][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:13<00:00,  5.57it/s, loss=0.656]
Epoch 9 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.35it/s, loss=2.74][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.50it/s, loss=2.74]
Epoch 9 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22454s
Epoch 9 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.79it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20455s
Epoch 9 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 9 iters: 157
Total time per 1 batch: 0.20402s
Epoch 9: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 9, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.66, 0.00, 0.00, fr_xe, 2.74
Epoch 9 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20851s
Epoch 9 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20436s
Epoch 9 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 9 iters: 157
Total time per 1 batch: 0.20386s
Epoch 9: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 9, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.66, 0.00, 0.00, fr_xe, 2.74

Why are you using a gpu batchsize=32 and only taking 15min for an epoch for freeze detector? \ I use 2×3090 gpu, batchsize=8×2 ,it will cost around four hours for freeze detector. How do you reduce your training time? image

verigle commented 1 year ago

I modified code for generate h5 file because generate the h5 file is very slowly, however I can not reach the same accuracy beacuse the order of dataset is not as same as Karpathy splits from https://github.com/aimagelab/meshed-memory-transformer. if you have some interesting of my modified code , please reference grit at brach opt_ids

verigle commented 1 year ago

I alse put H5 file to SSD , it may faster than read from mechanical hard disk。

verigle commented 1 year ago

10 epoch log;

Train: rank=0, epoch=0, phase=fr_xe
Epoch 0 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:18<00:00,  5.01it/s, loss=1.25][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:21<00:00,  5.52it/s, loss=1.25]
Epoch 0 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.46it/s, loss=3.97][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.59it/s, loss=3.97]
Epoch 0 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22200s
Epoch 0 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20595s
Epoch 0 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.76it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.16it/s]
Epoch: 0 iters: 157
Total time per 1 batch: 0.20528s
Epoch 0: valid scores: {'BLEU': [0.4669783296661464, 0.24582240555971746, 0.13479587494094583, 0.07239531663650367], 'METEOR': 0.10954924428703441, 'ROUGE': 0.34872282632850365, 'CIDEr': 0.08790172488631927}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 0, valid, 8.79, 46.70, 7.24, 34.87, 10.95, 24.58, 13.48, 1.25, 0.00, 0.00, fr_xe, 3.97
Epoch 0 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21069s
Epoch 0 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20469s
Epoch 0 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 0 iters: 157
Total time per 1 batch: 0.20416s
Epoch 0: test scores: {'BLEU': [0.4664962002208445, 0.24325046136142325, 0.1341976162794273, 0.0742708785109443], 'METEOR': 0.10937207416996544, 'ROUGE': 0.34777669352106455, 'CIDEr': 0.08940798205091863}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 0, test , 8.94, 46.65, 7.43, 34.78, 10.94, 24.33, 13.42, 1.25, 0.00, 0.00, fr_xe, 3.97
Train: rank=0, epoch=1, phase=fr_xe
Epoch 1 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:08<00:00,  6.05it/s, loss=0.921][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.59it/s, loss=0.921]
Epoch 1 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.42it/s, loss=3.37][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.60it/s, loss=3.37]
Epoch 1 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22185s
Epoch 1 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20407s
Epoch 1 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 1 iters: 157
Total time per 1 batch: 0.20349s
Epoch 1: valid scores: {'BLEU': [0.4177599999999917, 0.22081040233144753, 0.13440728245371522, 0.086634145554716], 'METEOR': 0.10917767439359446, 'ROUGE': 0.3293001799815639, 'CIDEr': 0.09119403689594928}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 1, valid, 9.12, 41.78, 8.66, 32.93, 10.92, 22.08, 13.44, 0.92, 0.00, 0.00, fr_xe, 3.37
Epoch 1 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21100s
Epoch 1 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.84it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20332s
Epoch 1 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.85it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 1 iters: 157
Total time per 1 batch: 0.20278s
Epoch 1: test scores: {'BLEU': [0.41875999999999164, 0.22176797684867283, 0.13475045843026873, 0.08727707854272869], 'METEOR': 0.10943320624155432, 'ROUGE': 0.3286083021124142, 'CIDEr': 0.0934703590652789}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 1, test , 9.35, 41.88, 8.73, 32.86, 10.94, 22.18, 13.48, 0.92, 0.00, 0.00, fr_xe, 3.37
Train: rank=0, epoch=2, phase=fr_xe
Epoch 2 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:15<00:00,  5.11it/s, loss=0.815][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:17<00:00,  5.55it/s, loss=0.815]
Epoch 2 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.30it/s, loss=3.11][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.56it/s, loss=3.11]
Epoch 2 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22380s
Epoch 2 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20404s
Epoch 2 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 2 iters: 157
Total time per 1 batch: 0.20375s
Epoch 2: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 2, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.82, 0.00, 0.00, fr_xe, 3.11
Epoch 2 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21104s
Epoch 2 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20410s
Epoch 2 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 2 iters: 157
Total time per 1 batch: 0.20358s
Epoch 2: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 2, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.82, 0.00, 0.00, fr_xe, 3.11
Train: rank=0, epoch=3, phase=fr_xe
Epoch 3 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.80it/s, loss=0.762][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.57it/s, loss=0.762]
Epoch 3 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.60it/s, loss=2.97][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.70it/s, loss=2.97]
Epoch 3 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22389s
Epoch 3 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20444s
Epoch 3 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:36<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 3 iters: 157
Total time per 1 batch: 0.20382s
Epoch 3: valid scores: {'BLEU': [0.3880270749907782, 0.1498102872688645, 0.08511781760022998, 0.050539934594252114], 'METEOR': 0.09991918939322139, 'ROUGE': 0.30912337673077656, 'CIDEr': 0.07096231761736432}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 3, valid, 7.10, 38.80, 5.05, 30.91, 9.99, 14.98, 8.51, 0.76, 0.00, 0.00, fr_xe, 2.97
Epoch 3 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20925s
Epoch 3 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.83it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20409s
Epoch 3 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:36<00:00,  4.84it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.22it/s]
Epoch: 3 iters: 157
Total time per 1 batch: 0.20361s
Epoch 3: test scores: {'BLEU': [0.38801692079215583, 0.15115514556365187, 0.08703201785437005, 0.053794063909602625], 'METEOR': 0.1004634083610197, 'ROUGE': 0.3098636871323689, 'CIDEr': 0.07426506154284174}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 3, test , 7.43, 38.80, 5.38, 30.99, 10.05, 15.12, 8.70, 0.76, 0.00, 0.00, fr_xe, 2.97
Train: rank=0, epoch=4, phase=fr_xe
Epoch 4 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:18<00:00,  4.65it/s, loss=0.729][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:20<00:00,  5.53it/s, loss=0.729]
Epoch 4 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:26<00:00, 12.23it/s, loss=2.88][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.42it/s, loss=2.88]
Epoch 4 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22306s
Epoch 4 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20433s
Epoch 4 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 4 iters: 157
Total time per 1 batch: 0.20391s
Epoch 4: valid scores: {'BLEU': [0.3956675189916203, 0.14959241499951006, 0.08455045357940592, 0.05072552705044864], 'METEOR': 0.10225475159056896, 'ROUGE': 0.3122258530606642, 'CIDEr': 0.07015325932525968}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 4, valid, 7.02, 39.57, 5.07, 31.22, 10.23, 14.96, 8.46, 0.73, 0.00, 0.00, fr_xe, 2.88
Epoch 4 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21257s
Epoch 4 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20408s
Epoch 4 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 4 iters: 157
Total time per 1 batch: 0.20355s
Epoch 4: test scores: {'BLEU': [0.39403577331772227, 0.14921009455837111, 0.083850153056571, 0.050482385367933987], 'METEOR': 0.10182105655673135, 'ROUGE': 0.31149491539952134, 'CIDEr': 0.06971117398735341}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 4, test , 6.97, 39.40, 5.05, 31.15, 10.18, 14.92, 8.39, 0.73, 0.00, 0.00, fr_xe, 2.88
Train: rank=0, epoch=5, phase=fr_xe
Epoch 5 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:12<00:00,  5.54it/s, loss=0.707][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.57it/s, loss=0.707]
Epoch 5 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.46it/s, loss=2.83][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.63it/s, loss=2.83]
Epoch 5 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.23301s
Epoch 5 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20449s
Epoch 5 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 5 iters: 157
Total time per 1 batch: 0.20397s
Epoch 5: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 5, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.71, 0.00, 0.00, fr_xe, 2.83
Epoch 5 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20863s
Epoch 5 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20429s
Epoch 5 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 5 iters: 157
Total time per 1 batch: 0.20373s
Epoch 5: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 5, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.71, 0.00, 0.00, fr_xe, 2.83
Train: rank=0, epoch=6, phase=fr_xe
Epoch 6 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.17it/s, loss=0.69][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:16<00:00,  5.55it/s, loss=0.69]
Epoch 6 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.35it/s, loss=2.79][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.60it/s, loss=2.79]
Epoch 6 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22445s
Epoch 6 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20437s
Epoch 6 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.80it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.18it/s]
Epoch: 6 iters: 157
Total time per 1 batch: 0.20382s
Epoch 6: valid scores: {'BLEU': [0.3626799999999928, 0.18942318055964488, 0.07775216931829154, 0.04351562148082592], 'METEOR': 0.09470731840804052, 'ROUGE': 0.3105374796878281, 'CIDEr': 0.055744577283453285}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 6, valid, 5.57, 36.27, 4.35, 31.05, 9.47, 18.94, 7.78, 0.69, 0.00, 0.00, fr_xe, 2.79
Epoch 6 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21029s
Epoch 6 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20432s
Epoch 6 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 6 iters: 157
Total time per 1 batch: 0.20382s
Epoch 6: test scores: {'BLEU': [0.35903999999999286, 0.18394979025085054, 0.0703169688713343, 0.03666873619238479], 'METEOR': 0.09287212083601691, 'ROUGE': 0.3071016066175093, 'CIDEr': 0.049071662717992064}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 6, test , 4.91, 35.90, 3.67, 30.71, 9.29, 18.39, 7.03, 0.69, 0.00, 0.00, fr_xe, 2.79
Train: rank=0, epoch=7, phase=fr_xe
Epoch 7 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:21<00:00,  5.86it/s, loss=0.677][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:23<00:00,  5.51it/s, loss=0.677]
Epoch 7 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.29it/s, loss=2.77][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.52it/s, loss=2.77]
Epoch 7 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22371s
Epoch 7 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20449s
Epoch 7 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.13it/s]
Epoch: 7 iters: 157
Total time per 1 batch: 0.20394s
Epoch 7: valid scores: {'BLEU': [0.38179999999999237, 0.20104699284826827, 0.08538310838454195, 0.04516644657768871], 'METEOR': 0.10478659426096619, 'ROUGE': 0.3169023526256761, 'CIDEr': 0.07155207847141871}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 7, valid, 7.16, 38.18, 4.52, 31.69, 10.48, 20.10, 8.54, 0.68, 0.00, 0.00, fr_xe, 2.77
Epoch 7 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21468s
Epoch 7 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20399s
Epoch 7 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.16it/s]
Epoch: 7 iters: 157
Total time per 1 batch: 0.20353s
Epoch 7: test scores: {'BLEU': [0.37899999999999245, 0.19519767758180867, 0.0771990834096997, 0.03888923579121348], 'METEOR': 0.10304347878594912, 'ROUGE': 0.3141675146161081, 'CIDEr': 0.06542015344438794}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 7, test , 6.54, 37.90, 3.89, 31.42, 10.30, 19.52, 7.72, 0.68, 0.00, 0.00, fr_xe, 2.77
Train: rank=0, epoch=8, phase=fr_xe
Epoch 8 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:10<00:00,  5.50it/s, loss=0.666][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:13<00:00,  5.58it/s, loss=0.666]
Epoch 8 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.32it/s, loss=2.75][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.69it/s, loss=2.75]
Epoch 8 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22344s
Epoch 8 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20453s
Epoch 8 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.80it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 8 iters: 157
Total time per 1 batch: 0.20401s
Epoch 8: valid scores: {'BLEU': [0.38179999999999237, 0.20104699284826827, 0.08538310838454195, 0.04516644657768871], 'METEOR': 0.10478659426096619, 'ROUGE': 0.3169023526256761, 'CIDEr': 0.07155207847141871}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 8, valid, 7.16, 38.18, 4.52, 31.69, 10.48, 20.10, 8.54, 0.67, 0.00, 0.00, fr_xe, 2.75
Epoch 8 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21202s
Epoch 8 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20418s
Epoch 8 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 8 iters: 157
Total time per 1 batch: 0.20374s
Epoch 8: test scores: {'BLEU': [0.37899999999999245, 0.19519767758180867, 0.0771990834096997, 0.03888923579121348], 'METEOR': 0.10304347878594912, 'ROUGE': 0.3141675146161081, 'CIDEr': 0.06542015344438794}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 8, test , 6.54, 37.90, 3.89, 31.42, 10.30, 19.52, 7.72, 0.67, 0.00, 0.00, fr_xe, 2.75
Train: rank=0, epoch=9, phase=fr_xe
Epoch 9 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.36it/s, loss=0.656][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:13<00:00,  5.57it/s, loss=0.656]
Epoch 9 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.35it/s, loss=2.74][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.50it/s, loss=2.74]
Epoch 9 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22454s
Epoch 9 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.79it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20455s
Epoch 9 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 9 iters: 157
Total time per 1 batch: 0.20402s
Epoch 9: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 9, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.66, 0.00, 0.00, fr_xe, 2.74
Epoch 9 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20851s
Epoch 9 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20436s
Epoch 9 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 9 iters: 157
Total time per 1 batch: 0.20386s
Epoch 9: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 9, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.66, 0.00, 0.00, fr_xe, 2.74

Why are you using a gpu batchsize=32 and only taking 15min for an epoch for freeze detector? I use 2×3090 gpu, batchsize=8×2 ,it will cost around four hours for freeze detector. How do you reduce your training time? image

can you send me the h5 file using baidunetdisk or lark ?

JingyuLi-code commented 1 year ago

10 epoch log;

Train: rank=0, epoch=0, phase=fr_xe
Epoch 0 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:18<00:00,  5.01it/s, loss=1.25][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:21<00:00,  5.52it/s, loss=1.25]
Epoch 0 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.46it/s, loss=3.97][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.59it/s, loss=3.97]
Epoch 0 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22200s
Epoch 0 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20595s
Epoch 0 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.76it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.16it/s]
Epoch: 0 iters: 157
Total time per 1 batch: 0.20528s
Epoch 0: valid scores: {'BLEU': [0.4669783296661464, 0.24582240555971746, 0.13479587494094583, 0.07239531663650367], 'METEOR': 0.10954924428703441, 'ROUGE': 0.34872282632850365, 'CIDEr': 0.08790172488631927}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 0, valid, 8.79, 46.70, 7.24, 34.87, 10.95, 24.58, 13.48, 1.25, 0.00, 0.00, fr_xe, 3.97
Epoch 0 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21069s
Epoch 0 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20469s
Epoch 0 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 0 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 0 iters: 157
Total time per 1 batch: 0.20416s
Epoch 0: test scores: {'BLEU': [0.4664962002208445, 0.24325046136142325, 0.1341976162794273, 0.0742708785109443], 'METEOR': 0.10937207416996544, 'ROUGE': 0.34777669352106455, 'CIDEr': 0.08940798205091863}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 0, test , 8.94, 46.65, 7.43, 34.78, 10.94, 24.33, 13.42, 1.25, 0.00, 0.00, fr_xe, 3.97
Train: rank=0, epoch=1, phase=fr_xe
Epoch 1 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:08<00:00,  6.05it/s, loss=0.921][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.59it/s, loss=0.921]
Epoch 1 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.42it/s, loss=3.37][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.60it/s, loss=3.37]
Epoch 1 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22185s
Epoch 1 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20407s
Epoch 1 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 1 iters: 157
Total time per 1 batch: 0.20349s
Epoch 1: valid scores: {'BLEU': [0.4177599999999917, 0.22081040233144753, 0.13440728245371522, 0.086634145554716], 'METEOR': 0.10917767439359446, 'ROUGE': 0.3293001799815639, 'CIDEr': 0.09119403689594928}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 1, valid, 9.12, 41.78, 8.66, 32.93, 10.92, 22.08, 13.44, 0.92, 0.00, 0.00, fr_xe, 3.37
Epoch 1 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21100s
Epoch 1 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.84it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20332s
Epoch 1 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.85it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 1 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 1 iters: 157
Total time per 1 batch: 0.20278s
Epoch 1: test scores: {'BLEU': [0.41875999999999164, 0.22176797684867283, 0.13475045843026873, 0.08727707854272869], 'METEOR': 0.10943320624155432, 'ROUGE': 0.3286083021124142, 'CIDEr': 0.0934703590652789}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 1, test , 9.35, 41.88, 8.73, 32.86, 10.94, 22.18, 13.48, 0.92, 0.00, 0.00, fr_xe, 3.37
Train: rank=0, epoch=2, phase=fr_xe
Epoch 2 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:15<00:00,  5.11it/s, loss=0.815][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:17<00:00,  5.55it/s, loss=0.815]
Epoch 2 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.30it/s, loss=3.11][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.56it/s, loss=3.11]
Epoch 2 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22380s
Epoch 2 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20404s
Epoch 2 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 2 iters: 157
Total time per 1 batch: 0.20375s
Epoch 2: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 2, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.82, 0.00, 0.00, fr_xe, 3.11
Epoch 2 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21104s
Epoch 2 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20410s
Epoch 2 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 2 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 2 iters: 157
Total time per 1 batch: 0.20358s
Epoch 2: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 2, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.82, 0.00, 0.00, fr_xe, 3.11
Train: rank=0, epoch=3, phase=fr_xe
Epoch 3 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.80it/s, loss=0.762][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.57it/s, loss=0.762]
Epoch 3 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.60it/s, loss=2.97][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.70it/s, loss=2.97]
Epoch 3 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22389s
Epoch 3 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20444s
Epoch 3 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:36<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 3 iters: 157
Total time per 1 batch: 0.20382s
Epoch 3: valid scores: {'BLEU': [0.3880270749907782, 0.1498102872688645, 0.08511781760022998, 0.050539934594252114], 'METEOR': 0.09991918939322139, 'ROUGE': 0.30912337673077656, 'CIDEr': 0.07096231761736432}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 3, valid, 7.10, 38.80, 5.05, 30.91, 9.99, 14.98, 8.51, 0.76, 0.00, 0.00, fr_xe, 2.97
Epoch 3 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20925s
Epoch 3 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.83it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20409s
Epoch 3 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:36<00:00,  4.84it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 3 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.22it/s]
Epoch: 3 iters: 157
Total time per 1 batch: 0.20361s
Epoch 3: test scores: {'BLEU': [0.38801692079215583, 0.15115514556365187, 0.08703201785437005, 0.053794063909602625], 'METEOR': 0.1004634083610197, 'ROUGE': 0.3098636871323689, 'CIDEr': 0.07426506154284174}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 3, test , 7.43, 38.80, 5.38, 30.99, 10.05, 15.12, 8.70, 0.76, 0.00, 0.00, fr_xe, 2.97
Train: rank=0, epoch=4, phase=fr_xe
Epoch 4 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:18<00:00,  4.65it/s, loss=0.729][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:20<00:00,  5.53it/s, loss=0.729]
Epoch 4 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:26<00:00, 12.23it/s, loss=2.88][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.42it/s, loss=2.88]
Epoch 4 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22306s
Epoch 4 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20433s
Epoch 4 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 4 iters: 157
Total time per 1 batch: 0.20391s
Epoch 4: valid scores: {'BLEU': [0.3956675189916203, 0.14959241499951006, 0.08455045357940592, 0.05072552705044864], 'METEOR': 0.10225475159056896, 'ROUGE': 0.3122258530606642, 'CIDEr': 0.07015325932525968}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 4, valid, 7.02, 39.57, 5.07, 31.22, 10.23, 14.96, 8.46, 0.73, 0.00, 0.00, fr_xe, 2.88
Epoch 4 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21257s
Epoch 4 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20408s
Epoch 4 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 4 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 4 iters: 157
Total time per 1 batch: 0.20355s
Epoch 4: test scores: {'BLEU': [0.39403577331772227, 0.14921009455837111, 0.083850153056571, 0.050482385367933987], 'METEOR': 0.10182105655673135, 'ROUGE': 0.31149491539952134, 'CIDEr': 0.06971117398735341}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 4, test , 6.97, 39.40, 5.05, 31.15, 10.18, 14.92, 8.39, 0.73, 0.00, 0.00, fr_xe, 2.88
Train: rank=0, epoch=5, phase=fr_xe
Epoch 5 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:12<00:00,  5.54it/s, loss=0.707][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.57it/s, loss=0.707]
Epoch 5 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.46it/s, loss=2.83][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.63it/s, loss=2.83]
Epoch 5 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.23301s
Epoch 5 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20449s
Epoch 5 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 5 iters: 157
Total time per 1 batch: 0.20397s
Epoch 5: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 5, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.71, 0.00, 0.00, fr_xe, 2.83
Epoch 5 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20863s
Epoch 5 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20429s
Epoch 5 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.83it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 5 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 5 iters: 157
Total time per 1 batch: 0.20373s
Epoch 5: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 5, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.71, 0.00, 0.00, fr_xe, 2.83
Train: rank=0, epoch=6, phase=fr_xe
Epoch 6 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:14<00:00,  5.17it/s, loss=0.69][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - train: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:16<00:00,  5.55it/s, loss=0.69]
Epoch 6 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.35it/s, loss=2.79][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.60it/s, loss=2.79]
Epoch 6 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22445s
Epoch 6 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20437s
Epoch 6 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.80it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.18it/s]
Epoch: 6 iters: 157
Total time per 1 batch: 0.20382s
Epoch 6: valid scores: {'BLEU': [0.3626799999999928, 0.18942318055964488, 0.07775216931829154, 0.04351562148082592], 'METEOR': 0.09470731840804052, 'ROUGE': 0.3105374796878281, 'CIDEr': 0.055744577283453285}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 6, valid, 5.57, 36.27, 4.35, 31.05, 9.47, 18.94, 7.78, 0.69, 0.00, 0.00, fr_xe, 2.79
Epoch 6 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21029s
Epoch 6 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20432s
Epoch 6 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 6 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 6 iters: 157
Total time per 1 batch: 0.20382s
Epoch 6: test scores: {'BLEU': [0.35903999999999286, 0.18394979025085054, 0.0703169688713343, 0.03666873619238479], 'METEOR': 0.09287212083601691, 'ROUGE': 0.3071016066175093, 'CIDEr': 0.049071662717992064}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 6, test , 4.91, 35.90, 3.67, 30.71, 9.29, 18.39, 7.03, 0.69, 0.00, 0.00, fr_xe, 2.79
Train: rank=0, epoch=7, phase=fr_xe
Epoch 7 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:21<00:00,  5.86it/s, loss=0.677][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:23<00:00,  5.51it/s, loss=0.677]
Epoch 7 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.29it/s, loss=2.77][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.52it/s, loss=2.77]
Epoch 7 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22371s
Epoch 7 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20449s
Epoch 7 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.13it/s]
Epoch: 7 iters: 157
Total time per 1 batch: 0.20394s
Epoch 7: valid scores: {'BLEU': [0.38179999999999237, 0.20104699284826827, 0.08538310838454195, 0.04516644657768871], 'METEOR': 0.10478659426096619, 'ROUGE': 0.3169023526256761, 'CIDEr': 0.07155207847141871}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 7, valid, 7.16, 38.18, 4.52, 31.69, 10.48, 20.10, 8.54, 0.68, 0.00, 0.00, fr_xe, 2.77
Epoch 7 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21468s
Epoch 7 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20399s
Epoch 7 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 7 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.16it/s]
Epoch: 7 iters: 157
Total time per 1 batch: 0.20353s
Epoch 7: test scores: {'BLEU': [0.37899999999999245, 0.19519767758180867, 0.0771990834096997, 0.03888923579121348], 'METEOR': 0.10304347878594912, 'ROUGE': 0.3141675146161081, 'CIDEr': 0.06542015344438794}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 7, test , 6.54, 37.90, 3.89, 31.42, 10.30, 19.52, 7.72, 0.68, 0.00, 0.00, fr_xe, 2.77
Train: rank=0, epoch=8, phase=fr_xe
Epoch 8 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:10<00:00,  5.50it/s, loss=0.666][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:13<00:00,  5.58it/s, loss=0.666]
Epoch 8 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.32it/s, loss=2.75][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:25<00:00,  7.69it/s, loss=2.75]
Epoch 8 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22344s
Epoch 8 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.81it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20453s
Epoch 8 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.80it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.17it/s]
Epoch: 8 iters: 157
Total time per 1 batch: 0.20401s
Epoch 8: valid scores: {'BLEU': [0.38179999999999237, 0.20104699284826827, 0.08538310838454195, 0.04516644657768871], 'METEOR': 0.10478659426096619, 'ROUGE': 0.3169023526256761, 'CIDEr': 0.07155207847141871}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 8, valid, 7.16, 38.18, 4.52, 31.69, 10.48, 20.10, 8.54, 0.67, 0.00, 0.00, fr_xe, 2.75
Epoch 8 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.21202s
Epoch 8 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20418s
Epoch 8 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 8 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 8 iters: 157
Total time per 1 batch: 0.20374s
Epoch 8: test scores: {'BLEU': [0.37899999999999245, 0.19519767758180867, 0.0771990834096997, 0.03888923579121348], 'METEOR': 0.10304347878594912, 'ROUGE': 0.3141675146161081, 'CIDEr': 0.06542015344438794}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 8, test , 6.54, 37.90, 3.89, 31.42, 10.30, 19.52, 7.72, 0.67, 0.00, 0.00, fr_xe, 2.75
Train: rank=0, epoch=9, phase=fr_xe
Epoch 9 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:11<00:00,  5.36it/s, loss=0.656][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4425/4425 [13:13<00:00,  5.57it/s, loss=0.656]
Epoch 9 - validation:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▌| 195/196 [00:25<00:00, 12.35it/s, loss=2.74][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - validation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:26<00:00,  7.50it/s, loss=2.74]
Epoch 9 - evaluation on valid:   0%|                                                                                                             | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.22454s
Epoch 9 - evaluation on valid:  64%|███████████████████████████████████████████████████████████████                                    | 100/157 [00:25<00:11,  4.79it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20455s
Epoch 9 - evaluation on valid:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.81it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - evaluation on valid: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.19it/s]
Epoch: 9 iters: 157
Total time per 1 batch: 0.20402s
Epoch 9: valid scores: {'BLEU': [0.3529791667300347, 0.1414591126544642, 0.06299488259423028, 0.03485908085047199], 'METEOR': 0.10059827334083978, 'ROUGE': 0.3006740338866547, 'CIDEr': 0.06895854570136649}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 9, valid, 6.90, 35.30, 3.49, 30.07, 10.06, 14.15, 6.30, 0.66, 0.00, 0.00, fr_xe, 2.74
Epoch 9 - evaluation on test:   0%|                                                                                                              | 0/157 [00:00<?, ?it/s]Number of iterations: 1, batch_size=32, Total time per 1 batch: 0.20851s
Epoch 9 - evaluation on test:  64%|███████████████████████████████████████████████████████████████▋                                    | 100/157 [00:25<00:11,  4.82it/s]Number of iterations: 101, batch_size=32, Total time per 1 batch: 0.20436s
Epoch 9 - evaluation on test:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████▎| 156/157 [00:37<00:00,  4.82it/s][W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Epoch 9 - evaluation on test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [00:37<00:00,  4.20it/s]
Epoch: 9 iters: 157
Total time per 1 batch: 0.20386s
Epoch 9: test scores: {'BLEU': [0.3523721988314085, 0.1408002536166315, 0.06367281953988665, 0.035704094395407136], 'METEOR': 0.10049904707011564, 'ROUGE': 0.2998991451173569, 'CIDEr': 0.07132428082250658}

caption_4ds_20220906, B-IM, 384_640, maxwh, True, 9, test , 7.13, 35.24, 3.57, 29.99, 10.05, 14.08, 6.37, 0.66, 0.00, 0.00, fr_xe, 2.74

Why are you using a gpu batchsize=32 and only taking 15min for an epoch for freeze detector? I use 2×3090 gpu, batchsize=8×2 ,it will cost around four hours for freeze detector. How do you reduce your training time? image

can you send me the h5 file using baidunetdisk or lark ?

Sure, I'm uploading the h5 file to Baidu Cloud. You can contact me through QQ: 1765575056.