google-research / albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Apache License 2.0
3.22k stars 571 forks source link

torch.nn.modules.module.ModuleAttributeError: 'AlbertEmbeddings' object has no attribute 'bias' #241

Open dhs29 opened 3 years ago

dhs29 commented 3 years ago

transformers-cli convert --model_type albert \ --tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-64000 \ --config $ALBERT_BASE_DIR/albert_config.json \ --pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin

i am running this script AlbertConfig { "attention_probs_dropout_prob": 0, "bos_token_id": 2, "classifier_dropout_prob": 0.1, "down_scale_factor": 1, "embedding_size": 128, "eos_token_id": 3, "gap_size": 0, "hidden_act": "gelu", "hidden_dropout_prob": 0, "hidden_size": 768, "initializer_range": 0.02, "inner_group_num": 1, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "albert", "net_structure_type": 0, "num_attention_heads": 12, "num_hidden_groups": 1, "num_hidden_layers": 12, "num_memory_blocks": 0, "pad_token_id": 0, "type_vocab_size": 2, "vocab_size": 31990 }

Converting TensorFlow checkpoint from /data/NLP/ALBERT_Inspird_Train/albert_base/model.ckpt-64000 Loading TF weight bert/embeddings/layer_normalization/beta with shape [128] Loading TF weight bert/embeddings/layer_normalization/beta/adam_m with shape [128] Loading TF weight bert/embeddings/layer_normalization/beta/adam_v with shape [128] Loading TF weight bert/embeddings/layer_normalization/gamma with shape [128] Loading TF weight bert/embeddings/layer_normalization/gamma/adam_m with shape [128] Loading TF weight bert/embeddings/layer_normalization/gamma/adam_v with shape [128] Loading TF weight bert/embeddings/position_embeddings with shape [512, 128] Loading TF weight bert/embeddings/position_embeddings/adam_m with shape [512, 128] Loading TF weight bert/embeddings/position_embeddings/adam_v with shape [512, 128] Loading TF weight bert/embeddings/token_type_embeddings with shape [2, 128] Loading TF weight bert/embeddings/token_type_embeddings/adam_m with shape [2, 128] Loading TF weight bert/embeddings/token_type_embeddings/adam_v with shape [2, 128] Loading TF weight bert/embeddings/word_embeddings with shape [31990, 128] Loading TF weight bert/embeddings/word_embeddings/adam_m with shape [31990, 128] Loading TF weight bert/embeddings/word_embeddings/adam_v with shape [31990, 128] Loading TF weight bert/encoder/embedding_hidden_mapping_in/bias with shape [768] Loading TF weight bert/encoder/embedding_hidden_mapping_in/bias/adam_m with shape [768] Loading TF weight bert/encoder/embedding_hidden_mapping_in/bias/adam_v with shape [768] Loading TF weight bert/encoder/embedding_hidden_mapping_in/kernel with shape [128, 768] Loading TF weight bert/encoder/embedding_hidden_mapping_in/kernel/adam_m with shape [128, 768] Loading TF weight bert/encoder/embedding_hidden_mapping_in/kernel/adam_v with shape [128, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_m with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_v with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_m with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_v with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_m with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_v with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_m with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_v with shape [768, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias with shape [3072] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_m with shape [3072] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_v with shape [3072] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel with shape [768, 3072] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_m with shape [768, 3072] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_v with shape [768, 3072] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel with shape [3072, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_m with shape [3072, 768] Loading TF weight bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_v with shape [3072, 768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/gamma/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/beta with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/beta/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/beta/adam_v with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/gamma with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/gamma/adam_m with shape [768] Loading TF weight bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/gamma/adam_v with shape [768] Loading TF weight bert/pooler/dense/bias with shape [768] Loading TF weight bert/pooler/dense/bias/adam_m with shape [768] Loading TF weight bert/pooler/dense/bias/adam_v with shape [768] Loading TF weight bert/pooler/dense/kernel with shape [768, 768] Loading TF weight bert/pooler/dense/kernel/adam_m with shape [768, 768] Loading TF weight bert/pooler/dense/kernel/adam_v with shape [768, 768] Loading TF weight cls/predictions/output_bias with shape [31990] Loading TF weight cls/predictions/output_bias/adam_m with shape [31990] Loading TF weight cls/predictions/output_bias/adam_v with shape [31990] Loading TF weight cls/predictions/transform/dense/bias with shape [128] Loading TF weight cls/predictions/transform/dense/bias/adam_m with shape [128] Loading TF weight cls/predictions/transform/dense/bias/adam_v with shape [128] Loading TF weight cls/predictions/transform/dense/kernel with shape [768, 128] Loading TF weight cls/predictions/transform/dense/kernel/adam_m with shape [768, 128] Loading TF weight cls/predictions/transform/dense/kernel/adam_v with shape [768, 128] Loading TF weight cls/predictions/transform/layer_normalization_25/beta with shape [128] Loading TF weight cls/predictions/transform/layer_normalization_25/beta/adam_m with shape [128] Loading TF weight cls/predictions/transform/layer_normalization_25/beta/adam_v with shape [128] Loading TF weight cls/predictions/transform/layer_normalization_25/gamma with shape [128] Loading TF weight cls/predictions/transform/layer_normalization_25/gamma/adam_m with shape [128] Loading TF weight cls/predictions/transform/layer_normalization_25/gamma/adam_v with shape [128] Loading TF weight cls/seq_relationship/output_bias with shape [2] Loading TF weight cls/seq_relationship/output_bias/adam_m with shape [2] Loading TF weight cls/seq_relationship/output_bias/adam_v with shape [2] Loading TF weight cls/seq_relationship/output_weights with shape [2, 768] Loading TF weight cls/seq_relationship/output_weights/adam_m with shape [2, 768] Loading TF weight cls/seq_relationship/output_weights/adam_v with shape [2, 768] Loading TF weight global_step with shape [] bert/embeddings/layer_normalization/beta bert/embeddings/layer_normalization/beta/adam_m bert/embeddings/layer_normalization/beta/adam_v bert/embeddings/layer_normalization/gamma bert/embeddings/layer_normalization/gamma/adam_m bert/embeddings/layer_normalization/gamma/adam_v bert/embeddings/position_embeddings bert/embeddings/position_embeddings/adam_m bert/embeddings/position_embeddings/adam_v bert/embeddings/token_type_embeddings bert/embeddings/token_type_embeddings/adam_m bert/embeddings/token_type_embeddings/adam_v bert/embeddings/word_embeddings bert/embeddings/word_embeddings/adam_m bert/embeddings/word_embeddings/adam_v bert/encoder/embedding_hidden_mapping_in/bias bert/encoder/embedding_hidden_mapping_in/bias/adam_m bert/encoder/embedding_hidden_mapping_in/bias/adam_v bert/encoder/embedding_hidden_mapping_in/kernel bert/encoder/embedding_hidden_mapping_in/kernel/adam_m bert/encoder/embedding_hidden_mapping_in/kernel/adam_v bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_m bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_v bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_m bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_v bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_m bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_v bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_m bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_v bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_m bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_v bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_m bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_v bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_m bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_v bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_m bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_v bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_m bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_v bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_m bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_v bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_m bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_v bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_m bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_v bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/beta bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/beta/adam_m bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/beta/adam_v bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/gamma bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/gamma/adam_m bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_1/gamma/adam_v bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/beta bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/beta/adam_m bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/beta/adam_v bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/gamma bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/gamma/adam_m bert/encoder/transformer/group_0/layer_0/inner_group_0/layer_normalization_2/gamma/adam_v bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/beta bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/beta/adam_m bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/beta/adam_v bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/gamma bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/gamma/adam_m bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_3/gamma/adam_v bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/beta bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/beta/adam_m bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/beta/adam_v bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/gamma bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/gamma/adam_m bert/encoder/transformer/group_0_1/layer_1/inner_group_0/layer_normalization_4/gamma/adam_v bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/beta bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/beta/adam_m bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/beta/adam_v bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/gamma bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/gamma/adam_m bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_21/gamma/adam_v bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/beta bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/beta/adam_m bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/beta/adam_v bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/gamma bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/gamma/adam_m bert/encoder/transformer/group_0_10/layer_10/inner_group_0/layer_normalization_22/gamma/adam_v bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/beta bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/beta/adam_m bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/beta/adam_v bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/gamma bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/gamma/adam_m bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_23/gamma/adam_v bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/beta bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/beta/adam_m bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/beta/adam_v bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/gamma bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/gamma/adam_m bert/encoder/transformer/group_0_11/layer_11/inner_group_0/layer_normalization_24/gamma/adam_v bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/beta bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/beta/adam_m bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/beta/adam_v bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/gamma bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/gamma/adam_m bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_5/gamma/adam_v bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/beta bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/beta/adam_m bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/beta/adam_v bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/gamma bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/gamma/adam_m bert/encoder/transformer/group_0_2/layer_2/inner_group_0/layer_normalization_6/gamma/adam_v bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/beta bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/beta/adam_m bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/beta/adam_v bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/gamma bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/gamma/adam_m bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_7/gamma/adam_v bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/beta bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/beta/adam_m bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/beta/adam_v bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/gamma bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/gamma/adam_m bert/encoder/transformer/group_0_3/layer_3/inner_group_0/layer_normalization_8/gamma/adam_v bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/beta bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/beta/adam_m bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/beta/adam_v bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/gamma bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/gamma/adam_m bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_10/gamma/adam_v bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/beta bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/beta/adam_m bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/beta/adam_v bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/gamma bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/gamma/adam_m bert/encoder/transformer/group_0_4/layer_4/inner_group_0/layer_normalization_9/gamma/adam_v bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/beta bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/beta/adam_m bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/beta/adam_v bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/gamma bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/gamma/adam_m bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_11/gamma/adam_v bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/beta bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/beta/adam_m bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/beta/adam_v bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/gamma bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/gamma/adam_m bert/encoder/transformer/group_0_5/layer_5/inner_group_0/layer_normalization_12/gamma/adam_v bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/beta bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/beta/adam_m bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/beta/adam_v bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/gamma bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/gamma/adam_m bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_13/gamma/adam_v bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/beta bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/beta/adam_m bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/beta/adam_v bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/gamma bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/gamma/adam_m bert/encoder/transformer/group_0_6/layer_6/inner_group_0/layer_normalization_14/gamma/adam_v bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/beta bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/beta/adam_m bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/beta/adam_v bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/gamma bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/gamma/adam_m bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_15/gamma/adam_v bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/beta bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/beta/adam_m bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/beta/adam_v bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/gamma bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/gamma/adam_m bert/encoder/transformer/group_0_7/layer_7/inner_group_0/layer_normalization_16/gamma/adam_v bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/beta bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/beta/adam_m bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/beta/adam_v bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/gamma bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/gamma/adam_m bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_17/gamma/adam_v bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/beta bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/beta/adam_m bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/beta/adam_v bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/gamma bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/gamma/adam_m bert/encoder/transformer/group_0_8/layer_8/inner_group_0/layer_normalization_18/gamma/adam_v bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/beta bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/beta/adam_m bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/beta/adam_v bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/gamma bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/gamma/adam_m bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_19/gamma/adam_v bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/beta bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/beta/adam_m bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/beta/adam_v bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/gamma bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/gamma/adam_m bert/encoder/transformer/group_0_9/layer_9/inner_group_0/layer_normalization_20/gamma/adam_v bert/pooler/dense/bias bert/pooler/dense/bias/adam_m bert/pooler/dense/bias/adam_v bert/pooler/dense/kernel bert/pooler/dense/kernel/adam_m bert/pooler/dense/kernel/adam_v cls/predictions/output_bias cls/predictions/output_bias/adam_m cls/predictions/output_bias/adam_v cls/predictions/transform/dense/bias cls/predictions/transform/dense/bias/adam_m cls/predictions/transform/dense/bias/adam_v cls/predictions/transform/dense/kernel cls/predictions/transform/dense/kernel/adam_m cls/predictions/transform/dense/kernel/adam_v cls/predictions/transform/layer_normalization_25/beta cls/predictions/transform/layer_normalization_25/beta/adam_m cls/predictions/transform/layer_normalization_25/beta/adam_v cls/predictions/transform/layer_normalization_25/gamma cls/predictions/transform/layer_normalization_25/gamma/adam_m cls/predictions/transform/layer_normalization_25/gamma/adam_v cls/seq_relationship/output_bias cls/seq_relationship/output_bias/adam_m cls/seq_relationship/output_bias/adam_v cls/seq_relationship/output_weights cls/seq_relationship/output_weights/adam_m cls/seq_relationship/output_weights/adam_v global_step Skipping albert/embeddings/layer_normalization/beta Traceback (most recent call last): File "/home/dshah/venv/bin/transformers-cli", line 8, in sys.exit(main()) File "/home/dshah/venv/lib64/python3.8/site-packages/transformers/commands/transformers_cli.py", line 33, in main service.run() File "/home/dshah/venv/lib64/python3.8/site-packages/transformers/commands/convert.py", line 80, in run convert_tf_checkpoint_to_pytorch(self._tf_checkpoint, self._config, self._pytorch_dump_output) File "/home/dshah/venv/lib64/python3.8/site-packages/transformers/convert_albert_original_tf_checkpoint_to_pytorch.py", line 36, in convert_tf_checkpoint_to_pytorch load_tf_weights_in_albert(model, config, tf_checkpoint_path) File "/home/dshah/venv/lib64/python3.8/site-packages/transformers/modeling_albert.py", line 163, in load_tf_weights_in_albert pointer = getattr(pointer, "bias") File "/home/dshah/venv/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 771, in getattr raise ModuleAttributeError("'{}' object has no attribute '{}'".format( torch.nn.modules.module.ModuleAttributeError: 'AlbertEmbeddings' object has no attribute 'bias'

Ala-Na commented 1 year ago

Hi there !

I'm curious : Did you find a solution about this issue ?

Thank you