Closed Mano2610 closed 3 years ago
Hello, can you verify if the dataset is downloaded properly in ~/.cache/torch/mmf/data/datasets/textvqa/defaults/features/
?
Yes, I have checked and it's download completely. Also I have tried 3, 4 times. But getting the same error.
Hi, This is probably happening due to different file system on Windows. If possible, can you try on Ubuntu VM available in Windows 10 and higher?
I have only 300gb in my machine, will I able to have VM in windows 10. Also the data for mmf. I am little confused in allocating the space, so I am using windows.
I am specifically talking about https://ubuntu.com/tutorials/ubuntu-on-windows#1-overview
Hi @apsdehal ,
I have tried in ubantu and I have also allocated 4 gpu's for training but I am getting the below error, Kindly help me to resolve this issue,
**** GRID ENGINE GPU ASSIGNMENT: your job has been assigned GPU device(s): 0,1,2,3 [32m2021-03-19T02:19:53 | mmf.utils.configuration: [0mOverriding option config to projects/m4c/configs/textvqa/defaults.yaml [32m2021-03-19T02:19:53 | mmf.utils.configuration: [0mOverriding option datasets to textvqa [32m2021-03-19T02:19:53 | mmf.utils.configuration: [0mOverriding option model to m4c [32m2021-03-19T02:19:53 | mmf.utils.configuration: [0mOverriding option run_type to train [32m2021-03-19T02:19:59 | mmf.utils.distributed: [0mXLA Mode:None [32m2021-03-19T02:19:59 | mmf.utils.distributed: [0mXLA Mode:None [32m2021-03-19T02:19:59 | mmf.utils.distributed: [0mXLA Mode:None [32m2021-03-19T02:19:59 | mmf.utils.distributed: [0mDistributed Init (Rank 2): tcp://localhost:11433 [32m2021-03-19T02:19:59 | mmf.utils.distributed: [0mXLA Mode:None [32m2021-03-19T02:19:59 | mmf.utils.distributed: [0mDistributed Init (Rank 3): tcp://localhost:11433 [32m2021-03-19T02:19:59 | mmf.utils.distributed: [0mDistributed Init (Rank 1): tcp://localhost:11433 [32m2021-03-19T02:19:59 | mmf.utils.distributed: [0mDistributed Init (Rank 0): tcp://localhost:11433 [32m2021-03-19T02:19:59 | mmf.utils.distributed: [0mInitialized Host gpu-1-7.local as Rank 1 [32m2021-03-19T02:19:59 | mmf.utils.distributed: [0mInitialized Host gpu-1-7.local as Rank 3 [32m2021-03-19T02:20:00 | mmf.utils.distributed: [0mInitialized Host gpu-1-7.local as Rank 2 [32m2021-03-19T02:20:00 | mmf.utils.distributed: [0mInitialized Host gpu-1-7.local as Rank 0 [32m2021-03-19T02:20:03 | mmf: [0mLogging to: ./save/train.log [32m2021-03-19T02:20:03 | mmf_cli.run: [0mNamespace(config_override=None, local_rank=None, opts=['config=projects/m4c/configs/textvqa/defaults.yaml', 'datasets=textvqa', 'model=m4c', 'run_type=train']) [32m2021-03-19T02:20:03 | mmf_cli.run: [0mTorch version: 1.6.0 [32m2021-03-19T02:20:03 | mmf.utils.general: [0mCUDA Device 0 is: GeForce GTX 1080 Ti [32m2021-03-19T02:20:03 | mmf_cli.run: [0mUsing seed 3893195 [32m2021-03-19T02:20:03 | mmf.trainers.mmf_trainer: [0mLoading datasets [32m2021-03-19T02:20:15 | mmf.trainers.mmf_trainer: [0mLoading model Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.bias']
This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [32m2021-03-19T02:20:21 | mmf.trainers.mmf_trainer: [0mLoading optimizer [32m2021-03-19T02:20:21 | mmf.trainers.mmf_trainer: [0mLoading metrics [5m[31mWARNING[0m [32m2021-03-19T02:20:21 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: cfg.pretty() is deprecated and will be removed in a future version. Use OmegaConf.to_yaml(cfg)
builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-19T02:20:21 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: cfg.pretty() is deprecated and will be removed in a future version. Use OmegaConf.to_yaml(cfg)
builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-19T02:20:21 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia' builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-19T02:20:21 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia' builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-19T02:20:21 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: scheduler attributes has no params defined, defaulting to {}. builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-19T02:20:21 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: scheduler attributes has no params defined, defaulting to {}. builtin_warn(*args, **kwargs)
[32m2021-03-19T02:20:21 | mmf.trainers.core.device: [0mUsing PyTorch DistributedDataParallel [5m[31mWARNING[0m [32m2021-03-19T02:20:21 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True. builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-19T02:20:21 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True. builtin_warn(*args, **kwargs)
[32m2021-03-19T02:20:22 | mmf.trainers.mmf_trainer: [0m===== Model =====
[32m2021-03-19T02:20:22 | mmf.trainers.mmf_trainer: [0mDistributedDataParallel(
(module): M4C(
(text_bert): TextBert(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
)
(text_bert_out_linear): Identity()
(obj_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7(
(lc): Linear(in_features=2048, out_features=2048, bias=True)
)
(linear_obj_feat_to_mmt_in): Linear(in_features=2048, out_features=768, bias=True)
(linear_obj_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True)
(obj_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(obj_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(obj_drop): Dropout(p=0.1, inplace=False)
(ocr_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7(
(lc): Linear(in_features=2048, out_features=2048, bias=True)
)
(linear_ocr_feat_to_mmt_in): Linear(in_features=3002, out_features=768, bias=True)
(linear_ocr_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True)
(ocr_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(ocr_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(ocr_drop): Dropout(p=0.1, inplace=False)
(mmt): MMT(
(prev_pred_embeddings): PrevPredEmbeddings(
(position_embeddings): Embedding(100, 768)
(token_type_embeddings): Embedding(5, 768)
(ans_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(ocr_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(emb_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(emb_dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
)
(ocr_ptr_net): OcrPtrNet(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
)
(classifier): ClassifierLayer(
(module): Linear(in_features=768, out_features=5000, bias=True)
)
(losses): Losses(
(losses): ModuleList(
(0): MMFLoss(
(loss_criterion): M4CDecodingBCEWithMaskLoss()
)
)
)
)
)
[32m2021-03-19T02:20:22 | mmf.utils.general: [0mTotal Parameters: 90850184. Trained Parameters: 90850184
[32m2021-03-19T02:20:22 | mmf.trainers.core.training_loop: [0mStarting training...
Traceback (most recent call last):
File "/home/mk20376/anaconda3/envs/mmf/bin/mmf_run", line 33, in
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 779, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/multiprocessing/queues.py", line 104, in get if not self._poll(timeout): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/multiprocessing/connection.py", line 257, in poll return self._poll(timeout) File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/multiprocessing/connection.py", line 414, in _poll r = wait([self], timeout) File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/multiprocessing/connection.py", line 921, in wait ready = selector.select(timeout) File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/selectors.py", line 415, in select fd_event_list = self._selector.poll(timeout) File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 406897) is killed by signal: Killed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/home/mk20376/Proj/mmf/mmf_cli/run.py", line 66, in distributed_main main(configuration, init_distributed=True, predict=predict) File "/home/mk20376/Proj/mmf/mmf_cli/run.py", line 56, in main trainer.train() File "/home/mk20376/Proj/mmf/mmf/trainers/mmf_trainer.py", line 146, in train self.training_loop() File "/home/mk20376/Proj/mmf/mmf/trainers/core/training_loop.py", line 31, in training_loop self.run_training_epoch() File "/home/mk20376/Proj/mmf/mmf/trainers/core/training_loop.py", line 74, in run_training_epoch for idx, batch in enumerate(self.train_loader): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in next data = self._next_data() File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 974, in _next_data idx, data = self._get_data() File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 941, in _get_data success, data = self._try_get_data() File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 792, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) RuntimeError: DataLoader worker (pid(s) 406897) exited unexpectedly
Hi @Mano2610, thanks for using mmf:
Can you try again with adding training.num_workers=0
in the command line?
Hi @ytsheng - I am able to train and evaluate using m4c model. What is the final accuracy will we be getting here.
Also when I try to run the test, I am getting the below error,
**** GRID ENGINE GPU ASSIGNMENT: your job has been assigned GPU device(s): 0,1,2,3 [32m2021-03-23T15:30:46 | mmf.utils.configuration: [0mOverriding option config to projects/m4c/configs/textvqa/defaults.yaml [32m2021-03-23T15:30:46 | mmf.utils.configuration: [0mOverriding option datasets to textvqa [32m2021-03-23T15:30:46 | mmf.utils.configuration: [0mOverriding option model to m4c [32m2021-03-23T15:30:46 | mmf.utils.configuration: [0mOverriding option run_type to test [32m2021-03-23T15:30:46 | mmf.utils.configuration: [0mOverriding option training.num_workers to 0 [32m2021-03-23T15:30:46 | mmf.utils.configuration: [0mOverriding option checkpoint.resume_file to /home/mk20376/Proj/mmf/save/models/model_24000.ckpt [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mXLA Mode:None [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mXLA Mode:None [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mXLA Mode:None [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mXLA Mode:None [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mDistributed Init (Rank 1): tcp://localhost:10657 [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mDistributed Init (Rank 0): tcp://localhost:10657 [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mDistributed Init (Rank 3): tcp://localhost:10657 [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mDistributed Init (Rank 2): tcp://localhost:10657 [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mInitialized Host gpu-1-7.local as Rank 1 [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mInitialized Host gpu-1-7.local as Rank 0 [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mInitialized Host gpu-1-7.local as Rank 2 [32m2021-03-23T15:30:52 | mmf.utils.distributed: [0mInitialized Host gpu-1-7.local as Rank 3 [32m2021-03-23T15:30:55 | mmf: [0mLogging to: ./save/train.log [32m2021-03-23T15:30:55 | mmf_cli.run: [0mNamespace(config_override=None, local_rank=None, opts=['config=projects/m4c/configs/textvqa/defaults.yaml', 'datasets=textvqa', 'model=m4c', 'run_type=test', 'training.num_workers=0', 'checkpoint.resume_file=/home/mk20376/Proj/mmf/save/models/model_24000.ckpt']) [32m2021-03-23T15:30:55 | mmf_cli.run: [0mTorch version: 1.6.0 [32m2021-03-23T15:30:55 | mmf.utils.general: [0mCUDA Device 0 is: GeForce GTX 1080 Ti [32m2021-03-23T15:30:55 | mmf_cli.run: [0mUsing seed 56154769 [32m2021-03-23T15:30:55 | mmf.trainers.mmf_trainer: [0mLoading datasets [32m2021-03-23T15:31:08 | mmf.trainers.mmf_trainer: [0mLoading model Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.bias']
This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [32m2021-03-23T15:31:13 | mmf.trainers.mmf_trainer: [0mLoading optimizer [32m2021-03-23T15:31:13 | mmf.trainers.mmf_trainer: [0mLoading metrics [5m[31mWARNING[0m [32m2021-03-23T15:31:13 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: cfg.pretty() is deprecated and will be removed in a future version. Use OmegaConf.to_yaml(cfg)
builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-23T15:31:13 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: cfg.pretty() is deprecated and will be removed in a future version. Use OmegaConf.to_yaml(cfg)
builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-23T15:31:13 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia' builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-23T15:31:13 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia' builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-23T15:31:13 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: scheduler attributes has no params defined, defaulting to {}. builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-23T15:31:13 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: scheduler attributes has no params defined, defaulting to {}. builtin_warn(*args, **kwargs)
[32m2021-03-23T15:31:13 | mmf.utils.checkpoint: [0mLoading checkpoint [5m[31mWARNING[0m [32m2021-03-23T15:31:18 | mmf: [0mKey data_parallel is not present in registry, returning default value of None [5m[31mWARNING[0m [32m2021-03-23T15:31:18 | mmf: [0mKey distributed is not present in registry, returning default value of None [5m[31mWARNING[0m [32m2021-03-23T15:31:18 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler. builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-23T15:31:18 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler. builtin_warn(*args, **kwargs)
[32m2021-03-23T15:31:18 | mmf.utils.checkpoint: [0mCheckpoint loaded. [32m2021-03-23T15:31:18 | mmf.utils.checkpoint: [0mCurrent num updates: 24000 [32m2021-03-23T15:31:18 | mmf.utils.checkpoint: [0mCurrent iteration: 24000 [32m2021-03-23T15:31:18 | mmf.utils.checkpoint: [0mCurrent epoch: 89 [32m2021-03-23T15:31:18 | mmf.trainers.core.device: [0mUsing PyTorch DistributedDataParallel [5m[31mWARNING[0m [32m2021-03-23T15:31:18 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True. builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-23T15:31:18 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True. builtin_warn(*args, **kwargs)
[32m2021-03-23T15:31:18 | mmf.trainers.mmf_trainer: [0m===== Model ===== [32m2021-03-23T15:31:18 | mmf.trainers.mmf_trainer: [0mDistributedDataParallel( (module): M4C( (text_bert): TextBert( (embeddings): BertEmbeddings( (word_embeddings): Embedding(30522, 768, padding_idx=0) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) ) (text_bert_out_linear): Identity() (obj_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7( (lc): Linear(in_features=2048, out_features=2048, bias=True) ) (linear_obj_feat_to_mmt_in): Linear(in_features=2048, out_features=768, bias=True) (linear_obj_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True) (obj_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (obj_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (obj_drop): Dropout(p=0.1, inplace=False) (ocr_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7( (lc): Linear(in_features=2048, out_features=2048, bias=True) ) (linear_ocr_feat_to_mmt_in): Linear(in_features=3002, out_features=768, bias=True) (linear_ocr_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True) (ocr_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (ocr_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (ocr_drop): Dropout(p=0.1, inplace=False) (mmt): MMT( (prev_pred_embeddings): PrevPredEmbeddings( (position_embeddings): Embedding(100, 768) (token_type_embeddings): Embedding(5, 768) (ans_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (ocr_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (emb_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (emb_dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) ) (ocr_ptr_net): OcrPtrNet( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) ) (classifier): ClassifierLayer( (module): Linear(in_features=768, out_features=5000, bias=True) ) (losses): Losses( (losses): ModuleList( (0): MMFLoss( (loss_criterion): M4CDecodingBCEWithMaskLoss() ) ) ) ) ) [32m2021-03-23T15:31:18 | mmf.utils.general: [0mTotal Parameters: 90850184. Trained Parameters: 90850184 [32m2021-03-23T15:31:18 | mmf.trainers.mmf_trainer: [0mStarting inference on test set [32m2021-03-23T15:31:18 | mmf.common.test_reporter: [0mPredicting for textvqa
0%| | 0/45 [00:00<?, ?it/s][32m2021-03-23T15:31:18 | mmf.datasets.processors.processors: [0mLoading fasttext model now from /home/mk20376/.cache/torch/mmf/wiki.en.bin [32m2021-03-23T15:31:39 | mmf.datasets.processors.processors: [0mFinished loading fasttext model [5m[31mWARNING[0m [32m2021-03-23T15:31:44 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with evaluation.predict=true builtin_warn(*args, **kwargs)
[5m[31mWARNING[0m [32m2021-03-23T15:31:44 | py.warnings: [0m/home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with evaluation.predict=true builtin_warn(*args, **kwargs)
2%|▏ | 1/45 [00:26<19:26, 26.50s/it]
4%|▍ | 2/45 [00:29<13:52, 19.35s/it]
7%|▋ | 3/45 [00:31<10:00, 14.30s/it]
9%|▉ | 4/45 [00:34<07:25, 10.87s/it]
11%|█ | 5/45 [00:37<05:36, 8.42s/it]
13%|█▎ | 6/45 [00:39<04:17, 6.61s/it]
16%|█▌ | 7/45 [00:43<03:38, 5.75s/it]
18%|█▊ | 8/45 [00:45<02:48, 4.56s/it]
20%|██ | 9/45 [00:49<02:37, 4.38s/it]
22%|██▏ | 10/45 [00:51<02:12, 3.79s/it]
24%|██▍ | 11/45 [00:54<02:01, 3.59s/it]
27%|██▋ | 12/45 [00:56<01:46, 3.21s/it]
29%|██▉ | 13/45 [00:59<01:40, 3.14s/it]
31%|███ | 14/45 [01:02<01:34, 3.04s/it]
33%|███▎ | 15/45 [01:05<01:28, 2.96s/it]
36%|███▌ | 16/45 [01:08<01:24, 2.93s/it]
38%|███▊ | 17/45 [01:11<01:20, 2.88s/it]
40%|████ | 18/45 [01:15<01:31, 3.38s/it]
42%|████▏ | 19/45 [01:17<01:17, 2.99s/it]
44%|████▍ | 20/45 [01:20<01:12, 2.90s/it]
47%|████▋ | 21/45 [01:22<01:04, 2.70s/it]
49%|████▉ | 22/45 [01:25<01:03, 2.77s/it]
51%|█████ | 23/45 [01:28<01:00, 2.76s/it]
53%|█████▎ | 24/45 [01:31<00:59, 2.83s/it]
56%|█████▌ | 25/45 [01:34<00:55, 2.79s/it]
58%|█████▊ | 26/45 [01:36<00:52, 2.78s/it]
60%|██████ | 27/45 [01:39<00:49, 2.76s/it]
62%|██████▏ | 28/45 [01:44<00:56, 3.31s/it]
64%|██████▍ | 29/45 [01:47<00:53, 3.34s/it]
67%|██████▋ | 30/45 [01:52<00:56, 3.74s/it]
69%|██████▉ | 31/45 [01:54<00:48, 3.45s/it]
71%|███████ | 32/45 [01:57<00:41, 3.21s/it]
73%|███████▎ | 33/45 [02:00<00:36, 3.05s/it]
76%|███████▌ | 34/45 [02:03<00:32, 2.95s/it]
78%|███████▊ | 35/45 [02:06<00:30, 3.02s/it]
80%|████████ | 36/45 [02:08<00:26, 2.94s/it]
82%|████████▏ | 37/45 [02:11<00:23, 2.93s/it]
84%|████████▍ | 38/45 [02:14<00:19, 2.85s/it]
87%|████████▋ | 39/45 [02:17<00:16, 2.80s/it]
89%|████████▉ | 40/45 [02:19<00:13, 2.77s/it]
91%|█████████ | 41/45 [02:22<00:10, 2.74s/it]
93%|█████████▎| 42/45 [02:25<00:08, 2.71s/it]
96%|█████████▌| 43/45 [02:27<00:05, 2.63s/it]
98%|█████████▊| 44/45 [02:30<00:02, 2.68s/it]
100%|██████████| 45/45 [02:32<00:00, 2.51s/it]
100%|██████████| 45/45 [02:32<00:00, 3.39s/it]
Traceback (most recent call last):
File "/home/mk20376/anaconda3/envs/mmf/bin/mmf_run", line 33, in
-- Process 3 terminated with the following error: Traceback (most recent call last): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, args) File "/home/mk20376/Proj/mmf/mmf_cli/run.py", line 66, in distributed_main main(configuration, init_distributed=True, predict=predict) File "/home/mk20376/Proj/mmf/mmf_cli/run.py", line 56, in main trainer.train() File "/home/mk20376/Proj/mmf/mmf/trainers/mmf_trainer.py", line 142, in train self.inference() File "/home/mk20376/Proj/mmf/mmf/trainers/mmf_trainer.py", line 166, in inference report, meter = self.evaluation_loop(dataset, use_tqdm=True) File "/home/mk20376/Proj/mmf/mmf/trainers/core/evaluation_loop.py", line 69, in evaluation_loop combined_report.metrics = self.metrics(combined_report, combined_report) File "/home/mk20376/Proj/mmf/mmf/modules/metrics.py", line 156, in call sample_list, model_output, args, *kwargs File "/home/mk20376/Proj/mmf/mmf/modules/metrics.py", line 221, in _calculate_with_checks value = self.calculate(args, **kwargs) File "/home/mk20376/Proj/mmf/mmf/modules/metrics.py", line 691, in calculate accuracy = self.evaluator.eval_pred_list(predictions) File "/home/mk20376/Proj/mmf/mmf/utils/m4c_evaluators.py", line 250, in eval_pred_list unique_answer_scores = self._compute_answer_scores(entry["gt_answers"]) File "/home/mk20376/Proj/mmf/mmf/utils/m4c_evaluators.py", line 228, in _compute_answer_scores assert len(answers) == 10 AssertionError
Kindly help me to resolve this issue.
Kindly Let me know if there is any update on the above error?
@ronghanghu Can you check why this assertion error is happening?
Hi @Mano2610, the assertion error happens because the test set has no ground-truth labels. In the TextVQA task, you cannot evaluate locally on the test set. Instead, you should generate a prediction file and use the EvalAI server to evaluate it.
Please follow Point 3 in https://mmf.sh/docs/projects/m4c#training-and-evaluation for prediction file generation. The predictions should be submitted to https://eval.ai/web/challenges/challenge-page/874/overview. Thanks!
Hi, If I use Windows, is there any way to solve this problem?
Command:
mmf_run config=projects/m4c/configs/textvqa/defaults.yaml datasets=textvqa model=m4c run_type=train_val checkpoint.resume_zoo=m4c.textvqa.with_stvqa env.data_dir=D:/.cache env.save_dir=D:/mmf/save/m4c
2021-04-09T21:21:52 | mmf.utils.general: Total Parameters: 90850184. Trained Parameters: 90850184
2021-04-09T21:21:52 | mmf.trainers.core.training_loop: Starting training...
Traceback (most recent call last):
File "C:\Users\AYY.conda\envs\mmf\Scripts\mmf_run-script.py", line 33, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\AYY.conda\envs\mmf\lib\site-packages\torch\utils\data_utils\worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "C:\Users\AYY.conda\envs\mmf\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\AYY.conda\envs\mmf\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
Hi @2017210384,
We will have to look into fixing this in Windows. If possible, please use ubuntu in windows for now.
@2017210384 for a quick fix, you add an extra line as follows after https://github.com/facebookresearch/mmf/blob/2848ba37151fcb05bd85091966c446b08e67b289/mmf/datasets/databases/readers/feature_readers.py#L228
split = os.path.relpath(image_file_path, self.db_path).split(".npy")[0]
split = split.replace("\\", "/")
The cause is that Windows uses a different separator \
(instead of /
on Linux, where the feature dbs are generated). There might be more cases like this in the codebase.
@ronghanghu Thanks! It work!
Hi @apsdehal ,
I have generated the .json file, from the Point 3 in https://mmf.sh/docs/projects/m4c and submitted the .json file in EvalAI, but I am getting the failed result and I could see in Stderr file as follows.
Traceback (most recent call last): File "/code/scripts/workers/submission_worker.py", line 452, in run_submission submission_metadata=submission_serializer.data, File "/tmp/tmp6c2rmxzb/compute/challenge_data/challenge_874/main.py", line 199, in evaluate prepare_objects(annFile, resFile, phase_codename) File "/tmp/tmp6c2rmxzb/compute/challenge_data/challenge_874/main.py", line 105, in prepare_objects vqaRes = vqa.loadRes(res, resFile) File "/tmp/tmp6c2rmxzb/compute/challenge_data/challenge_874/vqa.py", line 160, in loadRes 'Results do not correspond to current TextVQA set. Either the results do not have predictions for all question ids in annotation file or there is atleast one question id that does not belong to the question ids in the annotation file. Please note that this year, you need to upload predictions on ALL test questions for test-dev evaluation unlike previous years when you needed to upload predictions on test-dev questions only.' AssertionError: Results do not correspond to current TextVQA set. Either the results do not have predictions for all question ids in annotation file or there is atleast one question id that does not belong to the question ids in the annotation file. Please note that this year, you need to upload predictions on ALL test questions for test-dev evaluation unlike previous years when you needed to upload predictions on test-dev questions only.
Here is the submitted .json file https://evalai.s3.amazonaws.com/media/submission_files/submission_149838/e3177541-d3aa-4ecc-8806-6c388daf99f6.json Please help me to resolve this issue.
Hi @Mano2610 Are you submitting to "Test-Standard Phase" in https://eval.ai/web/challenges/challenge-page/874/submission? ("Test-Standard Phase" is the phase for submission, not "Validation Phase").
I just tried submitting my previous M4C predictions m4c_textvqa_test.json.zip to https://eval.ai/web/challenges/challenge-page/874/submission, and the evaluation worked for me.
Hi @ronghanghu,
Thank you, it worked. I have generated test file and mistakenly submitted in Val.
Can you please help me with the dataset of 2020 and 2019.
When I try to get the dataset in TextVQA for 2020 and 2019 it's showing that site can't be reached. Kindly help me to resolve this, please.
When I try to get the dataset in TextVQA for 2020 and 2019 it's showing that site can't be reached. Kindly help me to resolve this, please.
@Mano2610 could you let me know the specific errors you encountered when getting TextVQA dataset and paste the error message here?
@ronghanghu , when I try to open the 2020 and 2019 challenge, I'm getting the below error,
This site can’t be reachedCheck if there is a typo in challenge. DNS_PROBE_FINISHED_NXDOMAIN
Is there any datasets other than 2021. I am looking for two different datasets.
Kindly help me with a different dataset for TextVQA.
@Mano2610 Hi, you can use https://textvqa.org/challenge/2020 and https://textvqa.org/challenge/2019 to access TextVQA challenge in these two years.
@apsdehal I think there's an error in the top navigation bar "challenge" button in https://textvqa.org/. The challenge URLs show up as https://challenge/, https://challenge/2020, and https://challenge/2019
@ronghanghu , thank you so much.
But I am unable to get the dataset for 2020 and 2019. I am able to view the challenges for 2020 and 2019. Are there any other datasets for textvqa, other than 2021?
@Mano2610 the TextVQA dataset is the same for each year. You can get the dataset from https://textvqa.org/dataset. The MMF library also allows automatic dataset downloading of TextVQA.
@ronghanghu, Thank you.
Are there any different datasets, other than TextVQA? I am looking for 2 different datasets for my project.
@Mano2610 You might want to try out the ST-VQA dataset (https://rrc.cvc.uab.es/?ch=11&com=downloads). It can also be downloaded automatically in MMF.
@ronghanghu I have submitted the prediction file(test-standard phase), I got a score of 39.84. Can you please confirm, is this the score for the m4c model?
@Mano2610 This is the correct score for m4c.textvqa.defaults
(trained on TextVQA only). If you run m4c.textvqa.with_stvqa
(trained on TextVQA + ST-VQA), you can get slightly higher score above 40.
Closing this issue now. Please re-open if you have further technical question or error
❓ Questions and Help
Hi,
While I am trying the training code with m4c model, I am getting the following error,
2021-03-11T03:34:15 | mmf.utils.general: Total Parameters: 90850184. Trained Parameters: 90850184 2021-03-11T03:34:15 | mmf.trainers.core.training_loop: Starting training... Traceback (most recent call last): File "C:\Users\kvman\anaconda3\envs\mmf\Scripts\mmf_run-script.py", line 33, in
sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')())
File "d:\project\new folder\mmf\mmf_cli\run.py", line 133, in run
main(configuration, predict=predict)
File "d:\project\new folder\mmf\mmf_cli\run.py", line 56, in main
trainer.train()
File "d:\project\new folder\mmf\mmf\trainers\mmf_trainer.py", line 132, in train
self.training_loop()
File "d:\project\new folder\mmf\mmf\trainers\core\training_loop.py", line 31, in training_loop
self.run_training_epoch()
File "d:\project\new folder\mmf\mmf\trainers\core\training_loop.py", line 74, in run_training_epoch
for idx, batch in enumerate(self.train_loader):
File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 363, in next
data = self._next_data()
File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 989, in _next_data
return self._process_data(data)
File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 1014, in _process_data
data.reraise()
File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch_utils.py", line 395, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "d:\project\new folder\mmf\mmf\datasets\databases\readers\feature_readers.py", line 231, in _load
imageid = int(split.split("")[-1])
ValueError: invalid literal for int() with base 10: 'train\7f14a505b6edcbc5'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data_utils\worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataset.py", line 207, in getitem
return self.datasets[dataset_idx][sample_idx]
File "d:\project\new folder\mmf\mmf\datasets\builders\textvqa\dataset.py", line 100, in getitem
features = self.features_db[idx]
File "d:\project\new folder\mmf\mmf\datasets\databases\features_database.py", line 91, in getitem
return self.get(image_info)
File "d:\project\new folder\mmf\mmf\datasets\databases\features_database.py", line 99, in get
return self.from_path(feature_path)
File "d:\project\new folder\mmf\mmf\datasets\databases\features_database.py", line 107, in from_path
features, infos = self._get_image_features_and_info(path)
File "d:\project\new folder\mmf\mmf\datasets\databases\features_database.py", line 80, in _get_image_features_and_info
image_feats, infos = self._read_features_and_info(feat_file)
File "d:\project\new folder\mmf\mmf\datasets\databases\features_database.py", line 65, in _read_features_and_info
feature, info = feature_reader.read(feat_file)
File "d:\project\new folder\mmf\mmf\datasets\databases\readers\feature_readers.py", line 95, in read
return self.feat_reader.read(image_feat_path)
File "d:\project\new folder\mmf\mmf\datasets\databases\readers\feature_readers.py", line 158, in read
image_info = self._load(image_feat_path)
File "d:\project\new folder\mmf\mmf\datasets\databases\readers\feature_readers.py", line 238, in _load
img_id_idx = self.image_id_indices[image_id]
KeyError: b'train\7f14a505b6edcbc5'
When I tried with model = "Lorra", I am getting the below error,
2021-03-11T03:27:37 | mmf.utils.general: Total Parameters: 192497485. Trained Parameters: 192497485 2021-03-11T03:27:37 | mmf.trainers.core.training_loop: Starting training... Traceback (most recent call last): File "C:\Users\kvman\anaconda3\envs\mmf\Scripts\mmf_run-script.py", line 33, in
sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')())
File "d:\project\new folder\mmf\mmf_cli\run.py", line 133, in run
main(configuration, predict=predict)
File "d:\project\new folder\mmf\mmf_cli\run.py", line 56, in main
trainer.train()
File "d:\project\new folder\mmf\mmf\trainers\mmf_trainer.py", line 132, in train
self.training_loop()
File "d:\project\new folder\mmf\mmf\trainers\core\training_loop.py", line 31, in training_loop
self.run_training_epoch()
File "d:\project\new folder\mmf\mmf\trainers\core\training_loop.py", line 74, in run_training_epoch
for idx, batch in enumerate(self.train_loader):
File "d:\project\new folder\mmf\mmf\datasets\multi_dataset_loader.py", line 213, in iter
return iter(self.loaders[0])
File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 291, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 737, in init
w.start()
File "C:\Users\kvman\anaconda3\envs\mmf\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\kvman\anaconda3\envs\mmf\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\kvman\anaconda3\envs\mmf\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\kvman\anaconda3\envs\mmf\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "C:\Users\kvman\anaconda3\envs\mmf\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
Kindly help me to resolve this issue.