Cuberick-Orion / CIRPLANT

Official implementation of the Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) | ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
MIT License
36 stars 8 forks source link

Question about a difference in the value of the author's recall and my recall. #4

Open GaEunKim-study opened 2 years ago

GaEunKim-study commented 2 years ago

I have a question because there is a difference in the value of the author's top-k recall and my top-k recall. When validating with the checkpoint posted by the author, the result value as shown in the following picture comes out. This results in a value different from the value raised by the author(site:https://github.com/Cuberick-Orion/CIRPLANT/blob/main/DOWNLOAD.md), and the loss value is 0.3. What's the reason?

캡처
Cuberick-Orion commented 2 years ago

Hi,

Thank you for your interest in our work.

Without detailed error messages/debug info I cannot say for sure, but the validation results you posted look like a random guess - suggesting maybe the checkpoint was not properly loaded.

We have verified that we are able to reproduce the numbers from our papers using the checkpoint.

Please comment with more details if you are unable to debug on your end.

GaEunKim-study commented 2 years ago

If I proceed with the verification, it looks like the following, but there seems to be no errors. What's the problem? image

aashish2000 commented 2 years ago

Hi @GaEunKim-study, I seem to be running into this issue as well, were you able to resolve the same?

Cuberick-Orion commented 2 years ago

Hi,

Following previous replies, could you upload a more detailed log of what happened?

Once again, the results resemble a random guess, suggesting that the checkpoint may not have been properly loaded. As evidence - the screenshot posted by @GaEunKim-study states Could not register shared tensor state dict hooks, which may be the reason.

Hi @GaEunKim-study, I seem to be running into this issue as well, were you able to resolve the same?

aashish2000 commented 2 years ago

This is a more detailed log of my results, let me know if any other log files are required to diagnose this issue.

Global seed set to 88
===========
PLACEHOLDER (Insert manual comments here)
=====Args commenting======
Validation_test
===========
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
[18:22:20] INFO      Called with command (copy this for reproduction):                       lightning_logger.py:114trainval_oscar.py --dataset cirr --usefeat nlvr-resnet152_w_empty --max_epochs 300 --model CIRPLANT-img --model_typebert --model_name_or_path data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/ --task_name cirr --gpus 1
--img_feature_dim 2054 --max_img_seq_length 1 --model_type bert --do_lower_case --max_seq_length 40 --learning_rate
1e-05 --loss_type xe --seed 88 --drop_out 0.3 --weight_decay 0.05 --warmup_steps 0 --loss st --batch_size 32
--num_batches 529 --pin_memory --num_workers_per_gpu 0 --comment Validation_test --output
saved_models/cirr_rc2_iccv_release_test --log_by recall_inset_top1_correct_composition --validateonly
--load_from_checkpoint ./saved_models/epoch_277_step_147061.ckpt

sorted args (complete list):
[18:22:21] INFO      random_seed::88                                                         lightning_logger.py:114           INFO     Better speed can be achieved with apex installed from                       modeling_bert.py:226                    https://www.github.com/nvidia/apex .
           INFO     Better speed can be achieved with apex installed from                      modeling_xlnet.py:339                    https://www.github.com/nvidia/apex .
Could not register sharded tensor state dict hooks
           INFO     loading configuration file                                                 modeling_utils.py:160                    data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/config.json
           INFO     Model config {                                                             modeling_utils.py:177                      "attention_probs_dropout_prob": 0.1,
                      "finetuning_task": "cirr",
                      "hidden_act": "gelu",
                      "hidden_dropout_prob": 0.1,
                      "hidden_size": 768,
                      "img_feature_dim": 2054,
                      "img_feature_type": "faster_r-cnn",
                      "initializer_range": 0.02,
                      "intermediate_size": 3072,
                      "layer_norm_eps": 1e-12,
                      "max_position_embeddings": 512,
                      "num_attention_heads": 12,
                      "num_hidden_layers": 12,
                      "num_labels": 2,
                      "output_attentions": false,
                      "output_hidden_states": false,
                      "torchscript": false,
                      "type_vocab_size": 2,
                      "vocab_size": 30522
                    }

           INFO     Model name                                                             tokenization_utils.py:170                    'data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/' not
                    found in model shortcut name list (bert-base-uncased,
                    bert-large-uncased, bert-base-cased, bert-large-cased,
                    bert-base-multilingual-uncased, bert-base-multilingual-cased,
                    bert-base-chinese, bert-base-german-cased,
                    bert-large-uncased-whole-word-masking,
                    bert-large-cased-whole-word-masking,
                    bert-large-uncased-whole-word-masking-finetuned-squad,
                    bert-large-cased-whole-word-masking-finetuned-squad,
                    bert-base-cased-finetuned-mrpc). Assuming
                    'data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/' is a
                    path or url to a directory containing tokenizer files.
           INFO     loading file data/Oscar_pretrained_models/base-vg-labels/ep_107_119208 tokenization_utils.py:214                    7/added_tokens.json
           INFO     loading file data/Oscar_pretrained_models/base-vg-labels/ep_107_119208 tokenization_utils.py:214                    7/special_tokens_map.json
           INFO     loading file                                                           tokenization_utils.py:214                    data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/vocab.txt
           INFO     loading weights file data/Oscar_pretrained_models/base-vg-labels/ep_107_11 modeling_utils.py:444                    92087/pytorch_model.bin
[18:22:22] INFO     BertImgModel Image Dimension: 2054                                          modeling_bert.py:158[18:22:23] INFO     Weights from pretrained model not used in ImageBertForImageFeature:        modeling_utils.py:505                    ['cls.predictions.bias', 'cls.predictions.transform.dense.weight',
                    'cls.predictions.transform.dense.bias',
                    'cls.predictions.transform.LayerNorm.weight',
                    'cls.predictions.transform.LayerNorm.bias',
                    'cls.predictions.decoder.weight', 'cls.seq_relationship.weight',
                    'cls.seq_relationship.bias']

Start init BaseDataset class...
           INFO      adding json split.rc2.val.json                                          lightning_logger.py:114           INFO      adding json cap.rc2.val.json                                            lightning_logger.py:114           INFO      adding json cap.ext.rc2.val.json                                        lightning_logger.py:114init CIRR_rc2 -> val -> None, usefeat::['nlvr-resnet152_w_empty'] ...
         total number of imgs:: 2297
         total number of pairs:: 4181

Start init BaseDataset class...
[18:22:24] INFO      adding json split.rc2.val.json                                          lightning_logger.py:114           INFO      adding json cap.rc2.val.json                                            lightning_logger.py:114           INFO      adding json cap.ext.rc2.val.json                                        lightning_logger.py:114init CIRR_rc2 -> val -> None, usefeat::['nlvr-resnet152_w_empty'] ...
         total number of imgs:: 2297
         total number of pairs:: 4181
           INFO                                                                              lightning_logger.py:114                     In testonly:: False

Init dataloader (split->val_loader):: val -> img+txt

Num_worker: 0, pin_memory: True
Init dataloader (split->val_loader):: val -> img

Num_worker: 0, pin_memory: True
           INFO      No. batch in train: 131                                                 lightning_logger.py:114           INFO      Optim::AdamW                                                            lightning_logger.py:114
===
Finished loading train/val datasets, entering train/val function

Could not register sharded tensor state dict hooks
           INFO     loading configuration file                                                 modeling_utils.py:160                    data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/config.json
           INFO     Model config {                                                             modeling_utils.py:177                      "attention_probs_dropout_prob": 0.1,
                      "finetuning_task": "cirr",
                      "hidden_act": "gelu",
                      "hidden_dropout_prob": 0.1,
                      "hidden_size": 768,
                      "img_feature_dim": 2054,
                      "img_feature_type": "faster_r-cnn",
                      "initializer_range": 0.02,
                      "intermediate_size": 3072,
                      "layer_norm_eps": 1e-12,
                      "max_position_embeddings": 512,
                      "num_attention_heads": 12,
                      "num_hidden_layers": 12,
                      "num_labels": 2,
                      "output_attentions": false,
                      "output_hidden_states": false,
                      "torchscript": false,
                      "type_vocab_size": 2,
                      "vocab_size": 30522
                    }

           INFO     Model name                                                             tokenization_utils.py:170                    'data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087' not found
                    in model shortcut name list (bert-base-uncased, bert-large-uncased,
                    bert-base-cased, bert-large-cased, bert-base-multilingual-uncased,
                    bert-base-multilingual-cased, bert-base-chinese,
                    bert-base-german-cased, bert-large-uncased-whole-word-masking,
                    bert-large-cased-whole-word-masking,
                    bert-large-uncased-whole-word-masking-finetuned-squad,
                    bert-large-cased-whole-word-masking-finetuned-squad,
                    bert-base-cased-finetuned-mrpc). Assuming
                    'data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087' is a path
                    or url to a directory containing tokenizer files.
           INFO     loading file data/Oscar_pretrained_models/base-vg-labels/ep_107_119208 tokenization_utils.py:214                    7/added_tokens.json
           INFO     loading file data/Oscar_pretrained_models/base-vg-labels/ep_107_119208 tokenization_utils.py:214                    7/special_tokens_map.json
           INFO     loading file                                                           tokenization_utils.py:214                    data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/vocab.txt
           INFO     loading weights file data/Oscar_pretrained_models/base-vg-labels/ep_107_11 modeling_utils.py:444                    92087/pytorch_model.bin
[18:22:26] INFO     BertImgModel Image Dimension: 2054                                          modeling_bert.py:158[18:22:27] INFO     Weights from pretrained model not used in ImageBertForImageFeature:        modeling_utils.py:505                    ['cls.predictions.bias', 'cls.predictions.transform.dense.weight',
                    'cls.predictions.transform.dense.bias',
                    'cls.predictions.transform.LayerNorm.weight',
                    'cls.predictions.transform.LayerNorm.bias',
                    'cls.predictions.decoder.weight', 'cls.seq_relationship.weight',
                    'cls.seq_relationship.bias']

loaded checkpoint from ./saved_models/epoch_277_step_147061.ckpt

/home/aza6352/anaconda3/envs/cirr/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py:817: LightningDeprecationWarning: `trainer.validate(val_dataloaders)` is deprecated in v1.4 and will be removed in v1.6. Use `trainer.validate(dataloaders)` instead.
  "`trainer.validate(val_dataloaders)` is deprecated in v1.4 and will be removed in v1.6."
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
Validating: 0it [00:00, ?it/s]

[('recall_top1_correct_composition', 0.0009567089213106913),
 ('recall_top2_correct_composition', 0.001195886151638364),
 ('recall_top5_correct_composition', 0.0028701267639320736),
 ('recall_top10_correct_composition', 0.004305190145898111),
 ('recall_top50_correct_composition', 0.01961253288686917),
 ('recall_top100_correct_composition', 0.03982300884955752),
 ('recall_inset_top1_correct_composition', 0.21107390576417126),
 ('recall_inset_top2_correct_composition', 0.39416407558000477),
 ('recall_inset_top3_correct_composition', 0.5862233915331261)]

--------------------------------------------------------------------------------
DATALOADER:0 VALIDATE RESULTS
{}
--------------------------------------------------------------------------------
DATALOADER:1 VALIDATE RESULTS
{}
--------------------------------------------------------------------------------
Validating: 0it [00:05, ?it/s]
aashish2000 commented 2 years ago

I think I have identified the issue, I am able to reproduce the recall scores while using ResNet152 image features. However, I wish to use the FRCNN image features along with this model. Which arguments do I need to change in order to use the FRCNN features in your script?

Cuberick-Orion commented 2 years ago

Hi,

Glad to hear that you managed to solve the issue. Indeed, our published result is based on ResNet152 features -- the global visual feature.

To use the FRCNN feature, you might need to modify the codebase a bit so that it accepts sequential image tokens.

Obviously, this comes after we published the ICCV paper -- but we did confirm that using sequential features brings some improvements to the performance.