Open GaEunKim-study opened 2 years ago
Hi,
Thank you for your interest in our work.
Without detailed error messages/debug info I cannot say for sure, but the validation results you posted look like a random guess - suggesting maybe the checkpoint was not properly loaded.
We have verified that we are able to reproduce the numbers from our papers using the checkpoint.
Please comment with more details if you are unable to debug on your end.
If I proceed with the verification, it looks like the following, but there seems to be no errors. What's the problem?
Hi @GaEunKim-study, I seem to be running into this issue as well, were you able to resolve the same?
Hi,
Following previous replies, could you upload a more detailed log of what happened?
Once again, the results resemble a random guess, suggesting that the checkpoint may not have been properly loaded. As evidence - the screenshot posted by @GaEunKim-study states Could not register shared tensor state dict hooks
, which may be the reason.
Hi @GaEunKim-study, I seem to be running into this issue as well, were you able to resolve the same?
This is a more detailed log of my results, let me know if any other log files are required to diagnose this issue.
Global seed set to 88
===========
PLACEHOLDER (Insert manual comments here)
=====Args commenting======
Validation_test
===========
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
[18:22:20] INFO Called with command (copy this for reproduction): lightning_logger.py:114trainval_oscar.py --dataset cirr --usefeat nlvr-resnet152_w_empty --max_epochs 300 --model CIRPLANT-img --model_typebert --model_name_or_path data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/ --task_name cirr --gpus 1
--img_feature_dim 2054 --max_img_seq_length 1 --model_type bert --do_lower_case --max_seq_length 40 --learning_rate
1e-05 --loss_type xe --seed 88 --drop_out 0.3 --weight_decay 0.05 --warmup_steps 0 --loss st --batch_size 32
--num_batches 529 --pin_memory --num_workers_per_gpu 0 --comment Validation_test --output
saved_models/cirr_rc2_iccv_release_test --log_by recall_inset_top1_correct_composition --validateonly
--load_from_checkpoint ./saved_models/epoch_277_step_147061.ckpt
sorted args (complete list):
[18:22:21] INFO random_seed::88 lightning_logger.py:114 INFO Better speed can be achieved with apex installed from modeling_bert.py:226 https://www.github.com/nvidia/apex .
INFO Better speed can be achieved with apex installed from modeling_xlnet.py:339 https://www.github.com/nvidia/apex .
Could not register sharded tensor state dict hooks
INFO loading configuration file modeling_utils.py:160 data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/config.json
INFO Model config { modeling_utils.py:177 "attention_probs_dropout_prob": 0.1,
"finetuning_task": "cirr",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"img_feature_dim": 2054,
"img_feature_type": "faster_r-cnn",
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"num_labels": 2,
"output_attentions": false,
"output_hidden_states": false,
"torchscript": false,
"type_vocab_size": 2,
"vocab_size": 30522
}
INFO Model name tokenization_utils.py:170 'data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/' not
found in model shortcut name list (bert-base-uncased,
bert-large-uncased, bert-base-cased, bert-large-cased,
bert-base-multilingual-uncased, bert-base-multilingual-cased,
bert-base-chinese, bert-base-german-cased,
bert-large-uncased-whole-word-masking,
bert-large-cased-whole-word-masking,
bert-large-uncased-whole-word-masking-finetuned-squad,
bert-large-cased-whole-word-masking-finetuned-squad,
bert-base-cased-finetuned-mrpc). Assuming
'data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/' is a
path or url to a directory containing tokenizer files.
INFO loading file data/Oscar_pretrained_models/base-vg-labels/ep_107_119208 tokenization_utils.py:214 7/added_tokens.json
INFO loading file data/Oscar_pretrained_models/base-vg-labels/ep_107_119208 tokenization_utils.py:214 7/special_tokens_map.json
INFO loading file tokenization_utils.py:214 data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/vocab.txt
INFO loading weights file data/Oscar_pretrained_models/base-vg-labels/ep_107_11 modeling_utils.py:444 92087/pytorch_model.bin
[18:22:22] INFO BertImgModel Image Dimension: 2054 modeling_bert.py:158[18:22:23] INFO Weights from pretrained model not used in ImageBertForImageFeature: modeling_utils.py:505 ['cls.predictions.bias', 'cls.predictions.transform.dense.weight',
'cls.predictions.transform.dense.bias',
'cls.predictions.transform.LayerNorm.weight',
'cls.predictions.transform.LayerNorm.bias',
'cls.predictions.decoder.weight', 'cls.seq_relationship.weight',
'cls.seq_relationship.bias']
Start init BaseDataset class...
INFO adding json split.rc2.val.json lightning_logger.py:114 INFO adding json cap.rc2.val.json lightning_logger.py:114 INFO adding json cap.ext.rc2.val.json lightning_logger.py:114init CIRR_rc2 -> val -> None, usefeat::['nlvr-resnet152_w_empty'] ...
total number of imgs:: 2297
total number of pairs:: 4181
Start init BaseDataset class...
[18:22:24] INFO adding json split.rc2.val.json lightning_logger.py:114 INFO adding json cap.rc2.val.json lightning_logger.py:114 INFO adding json cap.ext.rc2.val.json lightning_logger.py:114init CIRR_rc2 -> val -> None, usefeat::['nlvr-resnet152_w_empty'] ...
total number of imgs:: 2297
total number of pairs:: 4181
INFO lightning_logger.py:114 In testonly:: False
Init dataloader (split->val_loader):: val -> img+txt
Num_worker: 0, pin_memory: True
Init dataloader (split->val_loader):: val -> img
Num_worker: 0, pin_memory: True
INFO No. batch in train: 131 lightning_logger.py:114 INFO Optim::AdamW lightning_logger.py:114
===
Finished loading train/val datasets, entering train/val function
Could not register sharded tensor state dict hooks
INFO loading configuration file modeling_utils.py:160 data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/config.json
INFO Model config { modeling_utils.py:177 "attention_probs_dropout_prob": 0.1,
"finetuning_task": "cirr",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"img_feature_dim": 2054,
"img_feature_type": "faster_r-cnn",
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"num_labels": 2,
"output_attentions": false,
"output_hidden_states": false,
"torchscript": false,
"type_vocab_size": 2,
"vocab_size": 30522
}
INFO Model name tokenization_utils.py:170 'data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087' not found
in model shortcut name list (bert-base-uncased, bert-large-uncased,
bert-base-cased, bert-large-cased, bert-base-multilingual-uncased,
bert-base-multilingual-cased, bert-base-chinese,
bert-base-german-cased, bert-large-uncased-whole-word-masking,
bert-large-cased-whole-word-masking,
bert-large-uncased-whole-word-masking-finetuned-squad,
bert-large-cased-whole-word-masking-finetuned-squad,
bert-base-cased-finetuned-mrpc). Assuming
'data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087' is a path
or url to a directory containing tokenizer files.
INFO loading file data/Oscar_pretrained_models/base-vg-labels/ep_107_119208 tokenization_utils.py:214 7/added_tokens.json
INFO loading file data/Oscar_pretrained_models/base-vg-labels/ep_107_119208 tokenization_utils.py:214 7/special_tokens_map.json
INFO loading file tokenization_utils.py:214 data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087/vocab.txt
INFO loading weights file data/Oscar_pretrained_models/base-vg-labels/ep_107_11 modeling_utils.py:444 92087/pytorch_model.bin
[18:22:26] INFO BertImgModel Image Dimension: 2054 modeling_bert.py:158[18:22:27] INFO Weights from pretrained model not used in ImageBertForImageFeature: modeling_utils.py:505 ['cls.predictions.bias', 'cls.predictions.transform.dense.weight',
'cls.predictions.transform.dense.bias',
'cls.predictions.transform.LayerNorm.weight',
'cls.predictions.transform.LayerNorm.bias',
'cls.predictions.decoder.weight', 'cls.seq_relationship.weight',
'cls.seq_relationship.bias']
loaded checkpoint from ./saved_models/epoch_277_step_147061.ckpt
/home/aza6352/anaconda3/envs/cirr/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py:817: LightningDeprecationWarning: `trainer.validate(val_dataloaders)` is deprecated in v1.4 and will be removed in v1.6. Use `trainer.validate(dataloaders)` instead.
"`trainer.validate(val_dataloaders)` is deprecated in v1.4 and will be removed in v1.6."
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
Validating: 0it [00:00, ?it/s]
[('recall_top1_correct_composition', 0.0009567089213106913),
('recall_top2_correct_composition', 0.001195886151638364),
('recall_top5_correct_composition', 0.0028701267639320736),
('recall_top10_correct_composition', 0.004305190145898111),
('recall_top50_correct_composition', 0.01961253288686917),
('recall_top100_correct_composition', 0.03982300884955752),
('recall_inset_top1_correct_composition', 0.21107390576417126),
('recall_inset_top2_correct_composition', 0.39416407558000477),
('recall_inset_top3_correct_composition', 0.5862233915331261)]
--------------------------------------------------------------------------------
DATALOADER:0 VALIDATE RESULTS
{}
--------------------------------------------------------------------------------
DATALOADER:1 VALIDATE RESULTS
{}
--------------------------------------------------------------------------------
Validating: 0it [00:05, ?it/s]
I think I have identified the issue, I am able to reproduce the recall scores while using ResNet152 image features. However, I wish to use the FRCNN image features along with this model. Which arguments do I need to change in order to use the FRCNN features in your script?
Hi,
Glad to hear that you managed to solve the issue. Indeed, our published result is based on ResNet152 features -- the global visual feature.
To use the FRCNN feature, you might need to modify the codebase a bit so that it accepts sequential image tokens.
Obviously, this comes after we published the ICCV paper -- but we did confirm that using sequential features brings some improvements to the performance.
I have a question because there is a difference in the value of the author's top-k recall and my top-k recall. When validating with the checkpoint posted by the author, the result value as shown in the following picture comes out. This results in a value different from the value raised by the author(site:https://github.com/Cuberick-Orion/CIRPLANT/blob/main/DOWNLOAD.md), and the loss value is 0.3. What's the reason?