facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

Error during training Textvqa #810

Closed Mano2610 closed 3 years ago

Mano2610 commented 3 years ago

❓ Questions and Help

Hi,

While I am trying the training code with m4c model, I am getting the following error,

2021-03-11T03:34:15 | mmf.utils.general: Total Parameters: 90850184. Trained Parameters: 90850184 2021-03-11T03:34:15 | mmf.trainers.core.training_loop: Starting training... Traceback (most recent call last): File "C:\Users\kvman\anaconda3\envs\mmf\Scripts\mmf_run-script.py", line 33, in sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')()) File "d:\project\new folder\mmf\mmf_cli\run.py", line 133, in run main(configuration, predict=predict) File "d:\project\new folder\mmf\mmf_cli\run.py", line 56, in main trainer.train() File "d:\project\new folder\mmf\mmf\trainers\mmf_trainer.py", line 132, in train self.training_loop() File "d:\project\new folder\mmf\mmf\trainers\core\training_loop.py", line 31, in training_loop self.run_training_epoch() File "d:\project\new folder\mmf\mmf\trainers\core\training_loop.py", line 74, in run_training_epoch for idx, batch in enumerate(self.train_loader): File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 363, in next data = self._next_data() File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 989, in _next_data return self._process_data(data) File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 1014, in _process_data data.reraise() File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch_utils.py", line 395, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "d:\project\new folder\mmf\mmf\datasets\databases\readers\feature_readers.py", line 231, in _load imageid = int(split.split("")[-1]) ValueError: invalid literal for int() with base 10: 'train\7f14a505b6edcbc5'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data_utils\worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataset.py", line 207, in getitem return self.datasets[dataset_idx][sample_idx] File "d:\project\new folder\mmf\mmf\datasets\builders\textvqa\dataset.py", line 100, in getitem features = self.features_db[idx] File "d:\project\new folder\mmf\mmf\datasets\databases\features_database.py", line 91, in getitem return self.get(image_info) File "d:\project\new folder\mmf\mmf\datasets\databases\features_database.py", line 99, in get return self.from_path(feature_path) File "d:\project\new folder\mmf\mmf\datasets\databases\features_database.py", line 107, in from_path features, infos = self._get_image_features_and_info(path) File "d:\project\new folder\mmf\mmf\datasets\databases\features_database.py", line 80, in _get_image_features_and_info image_feats, infos = self._read_features_and_info(feat_file) File "d:\project\new folder\mmf\mmf\datasets\databases\features_database.py", line 65, in _read_features_and_info feature, info = feature_reader.read(feat_file) File "d:\project\new folder\mmf\mmf\datasets\databases\readers\feature_readers.py", line 95, in read return self.feat_reader.read(image_feat_path) File "d:\project\new folder\mmf\mmf\datasets\databases\readers\feature_readers.py", line 158, in read image_info = self._load(image_feat_path) File "d:\project\new folder\mmf\mmf\datasets\databases\readers\feature_readers.py", line 238, in _load img_id_idx = self.image_id_indices[image_id] KeyError: b'train\7f14a505b6edcbc5'

When I tried with model = "Lorra", I am getting the below error,

2021-03-11T03:27:37 | mmf.utils.general: Total Parameters: 192497485. Trained Parameters: 192497485 2021-03-11T03:27:37 | mmf.trainers.core.training_loop: Starting training... Traceback (most recent call last): File "C:\Users\kvman\anaconda3\envs\mmf\Scripts\mmf_run-script.py", line 33, in sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')()) File "d:\project\new folder\mmf\mmf_cli\run.py", line 133, in run main(configuration, predict=predict) File "d:\project\new folder\mmf\mmf_cli\run.py", line 56, in main trainer.train() File "d:\project\new folder\mmf\mmf\trainers\mmf_trainer.py", line 132, in train self.training_loop() File "d:\project\new folder\mmf\mmf\trainers\core\training_loop.py", line 31, in training_loop self.run_training_epoch() File "d:\project\new folder\mmf\mmf\trainers\core\training_loop.py", line 74, in run_training_epoch for idx, batch in enumerate(self.train_loader): File "d:\project\new folder\mmf\mmf\datasets\multi_dataset_loader.py", line 213, in iter return iter(self.loaders[0]) File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 291, in iter return _MultiProcessingDataLoaderIter(self) File "C:\Users\kvman\anaconda3\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 737, in init w.start() File "C:\Users\kvman\anaconda3\envs\mmf\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users\kvman\anaconda3\envs\mmf\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\kvman\anaconda3\envs\mmf\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\kvman\anaconda3\envs\mmf\lib\multiprocessing\popen_spawn_win32.py", line 89, in init reduction.dump(process_obj, to_child) File "C:\Users\kvman\anaconda3\envs\mmf\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) BrokenPipeError: [Errno 32] Broken pipe

Kindly help me to resolve this issue.

vedanuj commented 3 years ago

Hello, can you verify if the dataset is downloaded properly in ~/.cache/torch/mmf/data/datasets/textvqa/defaults/features/ ?

Mano2610 commented 3 years ago

Yes, I have checked and it's download completely. Also I have tried 3, 4 times. But getting the same error.

apsdehal commented 3 years ago

Hi, This is probably happening due to different file system on Windows. If possible, can you try on Ubuntu VM available in Windows 10 and higher?

Mano2610 commented 3 years ago

I have only 300gb in my machine, will I able to have VM in windows 10. Also the data for mmf. I am little confused in allocating the space, so I am using windows.

apsdehal commented 3 years ago

I am specifically talking about https://ubuntu.com/tutorials/ubuntu-on-windows#1-overview

Mano2610 commented 3 years ago

Hi @apsdehal ,

I have tried in ubantu and I have also allocated 4 gpu's for training but I am getting the below error, Kindly help me to resolve this issue,

**** GRID ENGINE GPU ASSIGNMENT: your job has been assigned GPU device(s): 0,1,2,3 2021-03-19T02:19:53 | mmf.utils.configuration: Overriding option config to projects/m4c/configs/textvqa/defaults.yaml 2021-03-19T02:19:53 | mmf.utils.configuration: Overriding option datasets to textvqa 2021-03-19T02:19:53 | mmf.utils.configuration: Overriding option model to m4c 2021-03-19T02:19:53 | mmf.utils.configuration: Overriding option run_type to train 2021-03-19T02:19:59 | mmf.utils.distributed: XLA Mode:None 2021-03-19T02:19:59 | mmf.utils.distributed: XLA Mode:None 2021-03-19T02:19:59 | mmf.utils.distributed: XLA Mode:None 2021-03-19T02:19:59 | mmf.utils.distributed: Distributed Init (Rank 2): tcp://localhost:11433 2021-03-19T02:19:59 | mmf.utils.distributed: XLA Mode:None 2021-03-19T02:19:59 | mmf.utils.distributed: Distributed Init (Rank 3): tcp://localhost:11433 2021-03-19T02:19:59 | mmf.utils.distributed: Distributed Init (Rank 1): tcp://localhost:11433 2021-03-19T02:19:59 | mmf.utils.distributed: Distributed Init (Rank 0): tcp://localhost:11433 2021-03-19T02:19:59 | mmf.utils.distributed: Initialized Host gpu-1-7.local as Rank 1 2021-03-19T02:19:59 | mmf.utils.distributed: Initialized Host gpu-1-7.local as Rank 3 2021-03-19T02:20:00 | mmf.utils.distributed: Initialized Host gpu-1-7.local as Rank 2 2021-03-19T02:20:00 | mmf.utils.distributed: Initialized Host gpu-1-7.local as Rank 0 2021-03-19T02:20:03 | mmf: Logging to: ./save/train.log 2021-03-19T02:20:03 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['config=projects/m4c/configs/textvqa/defaults.yaml', 'datasets=textvqa', 'model=m4c', 'run_type=train']) 2021-03-19T02:20:03 | mmf_cli.run: Torch version: 1.6.0 2021-03-19T02:20:03 | mmf.utils.general: CUDA Device 0 is: GeForce GTX 1080 Ti 2021-03-19T02:20:03 | mmf_cli.run: Using seed 3893195 2021-03-19T02:20:03 | mmf.trainers.mmf_trainer: Loading datasets 2021-03-19T02:20:15 | mmf.trainers.mmf_trainer: Loading model Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.bias']

WARNING 2021-03-19T02:20:21 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: cfg.pretty() is deprecated and will be removed in a future version. Use OmegaConf.to_yaml(cfg)

builtin_warn(*args, **kwargs)

WARNING 2021-03-19T02:20:21 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia' builtin_warn(*args, **kwargs)

WARNING 2021-03-19T02:20:21 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia' builtin_warn(*args, **kwargs)

WARNING 2021-03-19T02:20:21 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: scheduler attributes has no params defined, defaulting to {}. builtin_warn(*args, **kwargs)

WARNING 2021-03-19T02:20:21 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: scheduler attributes has no params defined, defaulting to {}. builtin_warn(*args, **kwargs)

2021-03-19T02:20:21 | mmf.trainers.core.device: Using PyTorch DistributedDataParallel WARNING 2021-03-19T02:20:21 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True. builtin_warn(*args, **kwargs)

WARNING 2021-03-19T02:20:21 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True. builtin_warn(*args, **kwargs)

2021-03-19T02:20:22 | mmf.trainers.mmf_trainer: ===== Model ===== 2021-03-19T02:20:22 | mmf.trainers.mmf_trainer: DistributedDataParallel( (module): M4C( (text_bert): TextBert( (embeddings): BertEmbeddings( (word_embeddings): Embedding(30522, 768, padding_idx=0) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) ) (text_bert_out_linear): Identity() (obj_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7( (lc): Linear(in_features=2048, out_features=2048, bias=True) ) (linear_obj_feat_to_mmt_in): Linear(in_features=2048, out_features=768, bias=True) (linear_obj_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True) (obj_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (obj_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (obj_drop): Dropout(p=0.1, inplace=False) (ocr_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7( (lc): Linear(in_features=2048, out_features=2048, bias=True) ) (linear_ocr_feat_to_mmt_in): Linear(in_features=3002, out_features=768, bias=True) (linear_ocr_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True) (ocr_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (ocr_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (ocr_drop): Dropout(p=0.1, inplace=False) (mmt): MMT( (prev_pred_embeddings): PrevPredEmbeddings( (position_embeddings): Embedding(100, 768) (token_type_embeddings): Embedding(5, 768) (ans_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (ocr_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (emb_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (emb_dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) ) (ocr_ptr_net): OcrPtrNet( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) ) (classifier): ClassifierLayer( (module): Linear(in_features=768, out_features=5000, bias=True) ) (losses): Losses( (losses): ModuleList( (0): MMFLoss( (loss_criterion): M4CDecodingBCEWithMaskLoss() ) ) ) ) ) 2021-03-19T02:20:22 | mmf.utils.general: Total Parameters: 90850184. Trained Parameters: 90850184 2021-03-19T02:20:22 | mmf.trainers.core.training_loop: Starting training... Traceback (most recent call last): File "/home/mk20376/anaconda3/envs/mmf/bin/mmf_run", line 33, in sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')()) File "/home/mk20376/Proj/mmf/mmf_cli/run.py", line 129, in run nprocs=config.distributed.world_size, File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join raise Exception(msg) Exception:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 779, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/multiprocessing/queues.py", line 104, in get if not self._poll(timeout): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/multiprocessing/connection.py", line 257, in poll return self._poll(timeout) File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/multiprocessing/connection.py", line 414, in _poll r = wait([self], timeout) File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/multiprocessing/connection.py", line 921, in wait ready = selector.select(timeout) File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/selectors.py", line 415, in select fd_event_list = self._selector.poll(timeout) File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 406897) is killed by signal: Killed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/home/mk20376/Proj/mmf/mmf_cli/run.py", line 66, in distributed_main main(configuration, init_distributed=True, predict=predict) File "/home/mk20376/Proj/mmf/mmf_cli/run.py", line 56, in main trainer.train() File "/home/mk20376/Proj/mmf/mmf/trainers/mmf_trainer.py", line 146, in train self.training_loop() File "/home/mk20376/Proj/mmf/mmf/trainers/core/training_loop.py", line 31, in training_loop self.run_training_epoch() File "/home/mk20376/Proj/mmf/mmf/trainers/core/training_loop.py", line 74, in run_training_epoch for idx, batch in enumerate(self.train_loader): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in next data = self._next_data() File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 974, in _next_data idx, data = self._get_data() File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 941, in _get_data success, data = self._try_get_data() File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 792, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) RuntimeError: DataLoader worker (pid(s) 406897) exited unexpectedly

hackgoofer commented 3 years ago

Hi @Mano2610, thanks for using mmf:

Can you try again with adding training.num_workers=0 in the command line?

Mano2610 commented 3 years ago

Hi @ytsheng - I am able to train and evaluate using m4c model. What is the final accuracy will we be getting here.

Also when I try to run the test, I am getting the below error,

**** GRID ENGINE GPU ASSIGNMENT: your job has been assigned GPU device(s): 0,1,2,3 2021-03-23T15:30:46 | mmf.utils.configuration: Overriding option config to projects/m4c/configs/textvqa/defaults.yaml 2021-03-23T15:30:46 | mmf.utils.configuration: Overriding option datasets to textvqa 2021-03-23T15:30:46 | mmf.utils.configuration: Overriding option model to m4c 2021-03-23T15:30:46 | mmf.utils.configuration: Overriding option run_type to test 2021-03-23T15:30:46 | mmf.utils.configuration: Overriding option training.num_workers to 0 2021-03-23T15:30:46 | mmf.utils.configuration: Overriding option checkpoint.resume_file to /home/mk20376/Proj/mmf/save/models/model_24000.ckpt 2021-03-23T15:30:52 | mmf.utils.distributed: XLA Mode:None 2021-03-23T15:30:52 | mmf.utils.distributed: XLA Mode:None 2021-03-23T15:30:52 | mmf.utils.distributed: XLA Mode:None 2021-03-23T15:30:52 | mmf.utils.distributed: XLA Mode:None 2021-03-23T15:30:52 | mmf.utils.distributed: Distributed Init (Rank 1): tcp://localhost:10657 2021-03-23T15:30:52 | mmf.utils.distributed: Distributed Init (Rank 0): tcp://localhost:10657 2021-03-23T15:30:52 | mmf.utils.distributed: Distributed Init (Rank 3): tcp://localhost:10657 2021-03-23T15:30:52 | mmf.utils.distributed: Distributed Init (Rank 2): tcp://localhost:10657 2021-03-23T15:30:52 | mmf.utils.distributed: Initialized Host gpu-1-7.local as Rank 1 2021-03-23T15:30:52 | mmf.utils.distributed: Initialized Host gpu-1-7.local as Rank 0 2021-03-23T15:30:52 | mmf.utils.distributed: Initialized Host gpu-1-7.local as Rank 2 2021-03-23T15:30:52 | mmf.utils.distributed: Initialized Host gpu-1-7.local as Rank 3 2021-03-23T15:30:55 | mmf: Logging to: ./save/train.log 2021-03-23T15:30:55 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['config=projects/m4c/configs/textvqa/defaults.yaml', 'datasets=textvqa', 'model=m4c', 'run_type=test', 'training.num_workers=0', 'checkpoint.resume_file=/home/mk20376/Proj/mmf/save/models/model_24000.ckpt']) 2021-03-23T15:30:55 | mmf_cli.run: Torch version: 1.6.0 2021-03-23T15:30:55 | mmf.utils.general: CUDA Device 0 is: GeForce GTX 1080 Ti 2021-03-23T15:30:55 | mmf_cli.run: Using seed 56154769 2021-03-23T15:30:55 | mmf.trainers.mmf_trainer: Loading datasets 2021-03-23T15:31:08 | mmf.trainers.mmf_trainer: Loading model Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.bias']

WARNING 2021-03-23T15:31:13 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: cfg.pretty() is deprecated and will be removed in a future version. Use OmegaConf.to_yaml(cfg)

builtin_warn(*args, **kwargs)

WARNING 2021-03-23T15:31:13 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia' builtin_warn(*args, **kwargs)

WARNING 2021-03-23T15:31:13 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia' builtin_warn(*args, **kwargs)

WARNING 2021-03-23T15:31:13 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: scheduler attributes has no params defined, defaulting to {}. builtin_warn(*args, **kwargs)

WARNING 2021-03-23T15:31:13 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: scheduler attributes has no params defined, defaulting to {}. builtin_warn(*args, **kwargs)

2021-03-23T15:31:13 | mmf.utils.checkpoint: Loading checkpoint WARNING 2021-03-23T15:31:18 | mmf: Key data_parallel is not present in registry, returning default value of None WARNING 2021-03-23T15:31:18 | mmf: Key distributed is not present in registry, returning default value of None WARNING 2021-03-23T15:31:18 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler. builtin_warn(*args, **kwargs)

WARNING 2021-03-23T15:31:18 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler. builtin_warn(*args, **kwargs)

2021-03-23T15:31:18 | mmf.utils.checkpoint: Checkpoint loaded. 2021-03-23T15:31:18 | mmf.utils.checkpoint: Current num updates: 24000 2021-03-23T15:31:18 | mmf.utils.checkpoint: Current iteration: 24000 2021-03-23T15:31:18 | mmf.utils.checkpoint: Current epoch: 89 2021-03-23T15:31:18 | mmf.trainers.core.device: Using PyTorch DistributedDataParallel WARNING 2021-03-23T15:31:18 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True. builtin_warn(*args, **kwargs)

WARNING 2021-03-23T15:31:18 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True. builtin_warn(*args, **kwargs)

2021-03-23T15:31:18 | mmf.trainers.mmf_trainer: ===== Model ===== 2021-03-23T15:31:18 | mmf.trainers.mmf_trainer: DistributedDataParallel( (module): M4C( (text_bert): TextBert( (embeddings): BertEmbeddings( (word_embeddings): Embedding(30522, 768, padding_idx=0) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) ) (text_bert_out_linear): Identity() (obj_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7( (lc): Linear(in_features=2048, out_features=2048, bias=True) ) (linear_obj_feat_to_mmt_in): Linear(in_features=2048, out_features=768, bias=True) (linear_obj_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True) (obj_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (obj_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (obj_drop): Dropout(p=0.1, inplace=False) (ocr_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7( (lc): Linear(in_features=2048, out_features=2048, bias=True) ) (linear_ocr_feat_to_mmt_in): Linear(in_features=3002, out_features=768, bias=True) (linear_ocr_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True) (ocr_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (ocr_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (ocr_drop): Dropout(p=0.1, inplace=False) (mmt): MMT( (prev_pred_embeddings): PrevPredEmbeddings( (position_embeddings): Embedding(100, 768) (token_type_embeddings): Embedding(5, 768) (ans_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (ocr_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (emb_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (emb_dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) ) (ocr_ptr_net): OcrPtrNet( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) ) (classifier): ClassifierLayer( (module): Linear(in_features=768, out_features=5000, bias=True) ) (losses): Losses( (losses): ModuleList( (0): MMFLoss( (loss_criterion): M4CDecodingBCEWithMaskLoss() ) ) ) ) ) 2021-03-23T15:31:18 | mmf.utils.general: Total Parameters: 90850184. Trained Parameters: 90850184 2021-03-23T15:31:18 | mmf.trainers.mmf_trainer: Starting inference on test set 2021-03-23T15:31:18 | mmf.common.test_reporter: Predicting for textvqa

0%| | 0/45 [00:00<?, ?it/s]2021-03-23T15:31:18 | mmf.datasets.processors.processors: Loading fasttext model now from /home/mk20376/.cache/torch/mmf/wiki.en.bin 2021-03-23T15:31:39 | mmf.datasets.processors.processors: Finished loading fasttext model WARNING 2021-03-23T15:31:44 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with evaluation.predict=true builtin_warn(*args, **kwargs)

WARNING 2021-03-23T15:31:44 | py.warnings: /home/mk20376/Proj/mmf/mmf/utils/distributed.py:327: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with evaluation.predict=true builtin_warn(*args, **kwargs)

2%|▏ | 1/45 [00:26<19:26, 26.50s/it] 4%|▍ | 2/45 [00:29<13:52, 19.35s/it] 7%|▋ | 3/45 [00:31<10:00, 14.30s/it] 9%|▉ | 4/45 [00:34<07:25, 10.87s/it] 11%|█ | 5/45 [00:37<05:36, 8.42s/it] 13%|█▎ | 6/45 [00:39<04:17, 6.61s/it] 16%|█▌ | 7/45 [00:43<03:38, 5.75s/it] 18%|█▊ | 8/45 [00:45<02:48, 4.56s/it] 20%|██ | 9/45 [00:49<02:37, 4.38s/it] 22%|██▏ | 10/45 [00:51<02:12, 3.79s/it] 24%|██▍ | 11/45 [00:54<02:01, 3.59s/it] 27%|██▋ | 12/45 [00:56<01:46, 3.21s/it] 29%|██▉ | 13/45 [00:59<01:40, 3.14s/it] 31%|███ | 14/45 [01:02<01:34, 3.04s/it] 33%|███▎ | 15/45 [01:05<01:28, 2.96s/it] 36%|███▌ | 16/45 [01:08<01:24, 2.93s/it] 38%|███▊ | 17/45 [01:11<01:20, 2.88s/it] 40%|████ | 18/45 [01:15<01:31, 3.38s/it] 42%|████▏ | 19/45 [01:17<01:17, 2.99s/it] 44%|████▍ | 20/45 [01:20<01:12, 2.90s/it] 47%|████▋ | 21/45 [01:22<01:04, 2.70s/it] 49%|████▉ | 22/45 [01:25<01:03, 2.77s/it] 51%|█████ | 23/45 [01:28<01:00, 2.76s/it] 53%|█████▎ | 24/45 [01:31<00:59, 2.83s/it] 56%|█████▌ | 25/45 [01:34<00:55, 2.79s/it] 58%|█████▊ | 26/45 [01:36<00:52, 2.78s/it] 60%|██████ | 27/45 [01:39<00:49, 2.76s/it] 62%|██████▏ | 28/45 [01:44<00:56, 3.31s/it] 64%|██████▍ | 29/45 [01:47<00:53, 3.34s/it] 67%|██████▋ | 30/45 [01:52<00:56, 3.74s/it] 69%|██████▉ | 31/45 [01:54<00:48, 3.45s/it] 71%|███████ | 32/45 [01:57<00:41, 3.21s/it] 73%|███████▎ | 33/45 [02:00<00:36, 3.05s/it] 76%|███████▌ | 34/45 [02:03<00:32, 2.95s/it] 78%|███████▊ | 35/45 [02:06<00:30, 3.02s/it] 80%|████████ | 36/45 [02:08<00:26, 2.94s/it] 82%|████████▏ | 37/45 [02:11<00:23, 2.93s/it] 84%|████████▍ | 38/45 [02:14<00:19, 2.85s/it] 87%|████████▋ | 39/45 [02:17<00:16, 2.80s/it] 89%|████████▉ | 40/45 [02:19<00:13, 2.77s/it] 91%|█████████ | 41/45 [02:22<00:10, 2.74s/it] 93%|█████████▎| 42/45 [02:25<00:08, 2.71s/it] 96%|█████████▌| 43/45 [02:27<00:05, 2.63s/it] 98%|█████████▊| 44/45 [02:30<00:02, 2.68s/it] 100%|██████████| 45/45 [02:32<00:00, 2.51s/it] 100%|██████████| 45/45 [02:32<00:00, 3.39s/it] Traceback (most recent call last): File "/home/mk20376/anaconda3/envs/mmf/bin/mmf_run", line 33, in sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')()) File "/home/mk20376/Proj/mmf/mmf_cli/run.py", line 129, in run nprocs=config.distributed.world_size, File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join raise Exception(msg) Exception:

-- Process 3 terminated with the following error: Traceback (most recent call last): File "/home/mk20376/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, args) File "/home/mk20376/Proj/mmf/mmf_cli/run.py", line 66, in distributed_main main(configuration, init_distributed=True, predict=predict) File "/home/mk20376/Proj/mmf/mmf_cli/run.py", line 56, in main trainer.train() File "/home/mk20376/Proj/mmf/mmf/trainers/mmf_trainer.py", line 142, in train self.inference() File "/home/mk20376/Proj/mmf/mmf/trainers/mmf_trainer.py", line 166, in inference report, meter = self.evaluation_loop(dataset, use_tqdm=True) File "/home/mk20376/Proj/mmf/mmf/trainers/core/evaluation_loop.py", line 69, in evaluation_loop combined_report.metrics = self.metrics(combined_report, combined_report) File "/home/mk20376/Proj/mmf/mmf/modules/metrics.py", line 156, in call sample_list, model_output, args, *kwargs File "/home/mk20376/Proj/mmf/mmf/modules/metrics.py", line 221, in _calculate_with_checks value = self.calculate(args, **kwargs) File "/home/mk20376/Proj/mmf/mmf/modules/metrics.py", line 691, in calculate accuracy = self.evaluator.eval_pred_list(predictions) File "/home/mk20376/Proj/mmf/mmf/utils/m4c_evaluators.py", line 250, in eval_pred_list unique_answer_scores = self._compute_answer_scores(entry["gt_answers"]) File "/home/mk20376/Proj/mmf/mmf/utils/m4c_evaluators.py", line 228, in _compute_answer_scores assert len(answers) == 10 AssertionError

Kindly help me to resolve this issue.

Mano2610 commented 3 years ago

Kindly Let me know if there is any update on the above error?

apsdehal commented 3 years ago

@ronghanghu Can you check why this assertion error is happening?

ronghanghu commented 3 years ago

Hi @Mano2610, the assertion error happens because the test set has no ground-truth labels. In the TextVQA task, you cannot evaluate locally on the test set. Instead, you should generate a prediction file and use the EvalAI server to evaluate it.

Please follow Point 3 in https://mmf.sh/docs/projects/m4c#training-and-evaluation for prediction file generation. The predictions should be submitted to https://eval.ai/web/challenges/challenge-page/874/overview. Thanks!

2017210384 commented 3 years ago

Hi, If I use Windows, is there any way to solve this problem?

Command:
mmf_run config=projects/m4c/configs/textvqa/defaults.yaml datasets=textvqa model=m4c run_type=train_val checkpoint.resume_zoo=m4c.textvqa.with_stvqa env.data_dir=D:/.cache env.save_dir=D:/mmf/save/m4c

2021-04-09T21:21:52 | mmf.utils.general: Total Parameters: 90850184. Trained Parameters: 90850184 2021-04-09T21:21:52 | mmf.trainers.core.training_loop: Starting training... Traceback (most recent call last): File "C:\Users\AYY.conda\envs\mmf\Scripts\mmf_run-script.py", line 33, in sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')()) File "c:\users\ayy\mmf\mmf_cli\run.py", line 133, in run main(configuration, predict=predict) File "c:\users\ayy\mmf\mmf_cli\run.py", line 56, in main trainer.train() File "c:\users\ayy\mmf\mmf\trainers\mmf_trainer.py", line 138, in train self.training_loop() File "c:\users\ayy\mmf\mmf\trainers\core\training_loop.py", line 33, in training_loop self.run_training_epoch() File "c:\users\ayy\mmf\mmf\trainers\core\training_loop.py", line 77, in run_training_epoch for idx, batch in enumerate(self.train_loader): File "c:\users\ayy\mmf\mmf\datasets\multi_dataset_loader.py", line 199, in next next_batch = next(self.current_iterator) File "C:\Users\AYY.conda\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 363, in next data = self._next_data() File "C:\Users\AYY.conda\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 989, in _next_data return self._process_data(data) File "C:\Users\AYY.conda\envs\mmf\lib\site-packages\torch\utils\data\dataloader.py", line 1014, in _process_data data.reraise() File "C:\Users\AYY.conda\envs\mmf\lib\site-packages\torch_utils.py", line 395, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "c:\users\ayy\mmf\mmf\datasets\databases\readers\feature_readers.py", line 231, in _load imageid = int(split.split("")[-1]) ValueError: invalid literal for int() with base 10: 'train\77a2030310d04cc0'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\AYY.conda\envs\mmf\lib\site-packages\torch\utils\data_utils\worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "C:\Users\AYY.conda\envs\mmf\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\AYY.conda\envs\mmf\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\AYY.conda\envs\mmf\lib\site-packages\torch\utils\data\dataset.py", line 207, in getitem return self.datasets[dataset_idx][sample_idx] File "c:\users\ayy\mmf\mmf\datasets\builders\textvqa\dataset.py", line 100, in getitem features = self.features_db[idx] File "c:\users\ayy\mmf\mmf\datasets\databases\features_database.py", line 91, in getitem return self.get(image_info) File "c:\users\ayy\mmf\mmf\datasets\databases\features_database.py", line 99, in get return self.from_path(feature_path) File "c:\users\ayy\mmf\mmf\datasets\databases\features_database.py", line 107, in from_path features, infos = self._get_image_features_and_info(path) File "c:\users\ayy\mmf\mmf\datasets\databases\features_database.py", line 80, in _get_image_features_and_info image_feats, infos = self._read_features_and_info(feat_file) File "c:\users\ayy\mmf\mmf\datasets\databases\features_database.py", line 65, in _read_features_and_info feature, info = feature_reader.read(feat_file) File "c:\users\ayy\mmf\mmf\datasets\databases\readers\feature_readers.py", line 95, in read return self.feat_reader.read(image_feat_path) File "c:\users\ayy\mmf\mmf\datasets\databases\readers\feature_readers.py", line 158, in read image_info = self._load(image_feat_path) File "c:\users\ayy\mmf\mmf\datasets\databases\readers\feature_readers.py", line 238, in _load img_id_idx = self.image_id_indices[image_id] KeyError: b'train\77a2030310d04cc0'

apsdehal commented 3 years ago

Hi @2017210384,

We will have to look into fixing this in Windows. If possible, please use ubuntu in windows for now.

ronghanghu commented 3 years ago

@2017210384 for a quick fix, you add an extra line as follows after https://github.com/facebookresearch/mmf/blob/2848ba37151fcb05bd85091966c446b08e67b289/mmf/datasets/databases/readers/feature_readers.py#L228

split = os.path.relpath(image_file_path, self.db_path).split(".npy")[0]
split = split.replace("\\", "/")

The cause is that Windows uses a different separator \ (instead of / on Linux, where the feature dbs are generated). There might be more cases like this in the codebase.

2017210384 commented 3 years ago

@ronghanghu Thanks! It work!

Mano2610 commented 3 years ago

Hi @apsdehal ,

I have generated the .json file, from the Point 3 in https://mmf.sh/docs/projects/m4c and submitted the .json file in EvalAI, but I am getting the failed result and I could see in Stderr file as follows.

Traceback (most recent call last): File "/code/scripts/workers/submission_worker.py", line 452, in run_submission submission_metadata=submission_serializer.data, File "/tmp/tmp6c2rmxzb/compute/challenge_data/challenge_874/main.py", line 199, in evaluate prepare_objects(annFile, resFile, phase_codename) File "/tmp/tmp6c2rmxzb/compute/challenge_data/challenge_874/main.py", line 105, in prepare_objects vqaRes = vqa.loadRes(res, resFile) File "/tmp/tmp6c2rmxzb/compute/challenge_data/challenge_874/vqa.py", line 160, in loadRes 'Results do not correspond to current TextVQA set. Either the results do not have predictions for all question ids in annotation file or there is atleast one question id that does not belong to the question ids in the annotation file. Please note that this year, you need to upload predictions on ALL test questions for test-dev evaluation unlike previous years when you needed to upload predictions on test-dev questions only.' AssertionError: Results do not correspond to current TextVQA set. Either the results do not have predictions for all question ids in annotation file or there is atleast one question id that does not belong to the question ids in the annotation file. Please note that this year, you need to upload predictions on ALL test questions for test-dev evaluation unlike previous years when you needed to upload predictions on test-dev questions only.

Here is the submitted .json file https://evalai.s3.amazonaws.com/media/submission_files/submission_149838/e3177541-d3aa-4ecc-8806-6c388daf99f6.json Please help me to resolve this issue.

ronghanghu commented 3 years ago

Hi @Mano2610 Are you submitting to "Test-Standard Phase" in https://eval.ai/web/challenges/challenge-page/874/submission? ("Test-Standard Phase" is the phase for submission, not "Validation Phase").

I just tried submitting my previous M4C predictions m4c_textvqa_test.json.zip to https://eval.ai/web/challenges/challenge-page/874/submission, and the evaluation worked for me.

Mano2610 commented 3 years ago

Hi @ronghanghu,

Thank you, it worked. I have generated test file and mistakenly submitted in Val.

Can you please help me with the dataset of 2020 and 2019.

When I try to get the dataset in TextVQA for 2020 and 2019 it's showing that site can't be reached. Kindly help me to resolve this, please.

ronghanghu commented 3 years ago

When I try to get the dataset in TextVQA for 2020 and 2019 it's showing that site can't be reached. Kindly help me to resolve this, please.

@Mano2610 could you let me know the specific errors you encountered when getting TextVQA dataset and paste the error message here?

Mano2610 commented 3 years ago

@ronghanghu , when I try to open the 2020 and 2019 challenge, I'm getting the below error,

This site can’t be reachedCheck if there is a typo in challenge. DNS_PROBE_FINISHED_NXDOMAIN

Is there any datasets other than 2021. I am looking for two different datasets.

Kindly help me with a different dataset for TextVQA.

ronghanghu commented 3 years ago

@Mano2610 Hi, you can use https://textvqa.org/challenge/2020 and https://textvqa.org/challenge/2019 to access TextVQA challenge in these two years.

@apsdehal I think there's an error in the top navigation bar "challenge" button in https://textvqa.org/. The challenge URLs show up as https://challenge/, https://challenge/2020, and https://challenge/2019

Mano2610 commented 3 years ago

@ronghanghu , thank you so much.

But I am unable to get the dataset for 2020 and 2019. I am able to view the challenges for 2020 and 2019. Are there any other datasets for textvqa, other than 2021?

ronghanghu commented 3 years ago

@Mano2610 the TextVQA dataset is the same for each year. You can get the dataset from https://textvqa.org/dataset. The MMF library also allows automatic dataset downloading of TextVQA.

Mano2610 commented 3 years ago

@ronghanghu, Thank you.

Are there any different datasets, other than TextVQA? I am looking for 2 different datasets for my project.

ronghanghu commented 3 years ago

@Mano2610 You might want to try out the ST-VQA dataset (https://rrc.cvc.uab.es/?ch=11&com=downloads). It can also be downloaded automatically in MMF.

Mano2610 commented 3 years ago

@ronghanghu I have submitted the prediction file(test-standard phase), I got a score of 39.84. Can you please confirm, is this the score for the m4c model?

ronghanghu commented 3 years ago

@Mano2610 This is the correct score for m4c.textvqa.defaults (trained on TextVQA only). If you run m4c.textvqa.with_stvqa (trained on TextVQA + ST-VQA), you can get slightly higher score above 40.

ronghanghu commented 3 years ago

Closing this issue now. Please re-open if you have further technical question or error