Exception: process 0 terminated with signal SIGKILL

Hi,

While I am trying the training code with m4c_captioner model, I am getting the following error,

/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, 2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option datasets to textcaps 2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option model to m4c_captioner 2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option config to projects/m4c_captioner/configs/m4c_captioner/textcaps/defaults.yaml 2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option env.save_dir to ./save/m4c_captioner/defaults 2022-09-08T19:36:20 | mmf.utils.configuration: Overriding option run_type to train_val /home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_LOG_DIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, /home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_REPORT_DIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, /home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_TENSORBOARD_LOGDIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, /home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_WANDB_LOGDIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, /home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, 2022-09-08T19:36:25 | mmf.utils.distributed: XLA Mode:False 2022-09-08T19:36:25 | mmf.utils.distributed: Distributed Init (Rank 0): tcp://localhost:14337 /home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, 2022-09-08T19:36:25 | mmf.utils.distributed: XLA Mode:False 2022-09-08T19:36:25 | mmf.utils.distributed: Distributed Init (Rank 3): tcp://localhost:14337 2022-09-08T19:36:25 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 3 /home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, /home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See https://github.com/omry/omegaconf/issues/572 for details. category=UserWarning, 2022-09-08T19:36:26 | mmf.utils.distributed: XLA Mode:False 2022-09-08T19:36:26 | mmf.utils.distributed: Distributed Init (Rank 1): tcp://localhost:14337 2022-09-08T19:36:26 | mmf.utils.distributed: XLA Mode:False 2022-09-08T19:36:26 | mmf.utils.distributed: Distributed Init (Rank 2): tcp://localhost:14337 2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 1 2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 2 2022-09-08T19:36:26 | mmf.utils.distributed: Initialized Host root1-Super-Server as Rank 0 2022-09-08T19:36:30 | mmf: Logging to: ./save/m4c_captioner/defaults/train.log 2022-09-08T19:36:30 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['datasets=textcaps', 'model=m4c_captioner', 'config=projects/m4c_captioner/configs/m4c_captioner/textcaps/defaults.yaml', 'env.save_dir=./save/m4c_captioner/defaults', 'run_type=train_val']) 2022-09-08T19:36:30 | mmf_cli.run: Torch version: 1.6.0 2022-09-08T19:36:30 | mmf.utils.general: CUDA Device 0 is: NVIDIA GeForce GTX 1080 Ti 2022-09-08T19:36:30 | mmf_cli.run: Using seed 30613796 2022-09-08T19:36:30 | mmf.trainers.mmf_trainer: Loading datasets loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.10.1", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.10.1", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }

loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/root1/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/root1/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/root1/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/root1/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/root1/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /home/root1/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /home/root1/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/root1/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /home/root1/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /home/root1/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.10.1", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }

Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.10.1", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }

2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training 2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training 2022-09-08T19:36:52 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training 2022-09-08T19:36:52 | mmf.trainers.mmf_trainer: Loading model loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.pooler.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.10.attention.self.value.weight', 'cls.predictions.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'cls.predictions.decoder.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.11.attention.output.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.9.intermediate.dense.weight', 'cls.seq_relationship.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.dense.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.query.weight']

This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased. If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training. 2022-09-08T19:36:57 | mmf.trainers.mmf_trainer: Loading optimizer 2022-09-08T19:36:57 | mmf.trainers.mmf_trainer: Loading metrics WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia' builtin_warn(*args, **kwargs)

WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: No type for scheduler specified even though lr_scheduler is True, setting default to 'Pythia' builtin_warn(*args, **kwargs)

WARNING 2022-09-08T19:36:58 | py.warnings: /media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf/utils/distributed.py:412: UserWarning: scheduler attributes has no params defined, defaulting to {}. builtin_warn(*args, **kwargs)

loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /home/root1/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.encoder.layer.8.attention.self.query.bias', 'cls.predictions.decoder.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'cls.seq_relationship.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.output.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.pooler.dense.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.output.dense.bias', 'cls.predictions.bias', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.8.output.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.key.weight']

This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased. If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training. Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.3.output.LayerNorm.bias', 'cls.predictions.decoder.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'cls.predictions.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'cls.seq_relationship.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.weight']
This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased. If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training. Some weights of the model checkpoint at bert-base-uncased were not used when initializing TextBert: ['cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'cls.predictions.decoder.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.pooler.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'cls.seq_relationship.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.weight']
This IS expected if you are initializing TextBert from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing TextBert from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). All the weights of TextBert were initialized from the model checkpoint at bert-base-uncased. If your task is similar to the task the model of the checkpoint was trained on, you can already use TextBert for predictions without further training. 2022-09-08T19:37:01 | mmf.trainers.mmf_trainer: ===== Model ===== 2022-09-08T19:37:01 | mmf.trainers.mmf_trainer: DistributedDataParallel( (module): M4CCaptioner( (text_bert): TextBert( (embeddings): BertEmbeddings( (word_embeddings): Embedding(30522, 768, padding_idx=0) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) ) (text_bert_out_linear): Identity() (obj_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7( (lc): Linear(in_features=2048, out_features=2048, bias=True) ) (linear_obj_feat_to_mmt_in): Linear(in_features=2048, out_features=768, bias=True) (linear_obj_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True) (obj_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (obj_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (obj_drop): Dropout(p=0.1, inplace=False) (ocr_faster_rcnn_fc7): FinetuneFasterRcnnFpnFc7( (lc): Linear(in_features=2048, out_features=2048, bias=True) ) (linear_ocr_feat_to_mmt_in): Linear(in_features=3002, out_features=768, bias=True) (linear_ocr_bbox_to_mmt_in): Linear(in_features=4, out_features=768, bias=True) (ocr_feat_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (ocr_bbox_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (ocr_drop): Dropout(p=0.1, inplace=False) (mmt): MMT( (prev_pred_embeddings): PrevPredEmbeddings( (position_embeddings): Embedding(100, 768) (token_type_embeddings): Embedding(5, 768) (ans_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (ocr_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (emb_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (emb_dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) ) (ocr_ptr_net): OcrPtrNet( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) ) (classifier): ClassifierLayer( (module): Linear(in_features=768, out_features=6736, bias=True) ) (losses): Losses( (losses): ModuleList( (0): MMFLoss( (loss_criterion): M4CDecodingBCEWithMaskLoss() ) ) ) ) ) 2022-09-08T19:37:01 | mmf.utils.general: Total Parameters: 92185168. Trained Parameters: 92185168 2022-09-08T19:37:01 | mmf.trainers.core.training_loop: Starting training... 2022-09-08T19:37:01 | mmf.datasets.processors.processors: Loading fasttext model now from /home/root1/.cache/torch/mmf/wiki.en.bin

Traceback (most recent call last): File "/home/root1/anaconda3/envs/mmf/bin/mmf_run", line 8, in sys.exit(run()) File "/media/root1/2f4dfbfa-d286-46c0-bdd8-c0caec6858d9/hzw/mmf/mmf_cli/run.py", line 129, in run nprocs=config.distributed.world_size, File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 108, in join (error_index, name) Exception: process 0 terminated with signal SIGKILL

I followed some tips to increase the memory to 64Gb, set the num_worker to 0, and reduce the batch_size to 64, but it still doesn't work.Kindly help me to resolve this issue.

facebookresearch / mmf

Exception: process 0 terminated with signal SIGKILL #1262