google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.31k stars 351 forks source link

Restoring ELECTRA-Small checkpoint into HuggingFace transformers model doesn't work properly #94

Open DevKretov opened 3 years ago

DevKretov commented 3 years ago

Environment info

Who can help

Seems like nobody

Information

Hello, I am trying to load electra-small checkpoint model from Google Research (https://github.com/google-research/electra) into HuggingFace's ElectraForMaskedLM object. There were several different ways I tried to achieve that:

Both worked without any exceptions. The first one didn't write anything to the output except for the contents of config.json file and the path the model would be saved to. The second one writes lots of information about skipping several variables and initialising others:

Initialize PyTorch weight ['discriminator_predictions', 'dense', 'bias'] discriminator_predictions/dense/bias Initialize PyTorch weight ['discriminator_predictions', 'dense', 'kernel'] discriminator_predictions/dense/kernel Initialize PyTorch weight ['discriminator_predictions', 'dense_prediction', 'bias'] discriminator_predictions/dense_1/bias Initialize PyTorch weight ['discriminator_predictions', 'dense_prediction', 'kernel'] discriminator_predictions/dense_1/kernel Initialize PyTorch weight ['electra', 'embeddings', 'LayerNorm', 'beta'] electra/embeddings/LayerNorm/beta Initialize PyTorch weight ['electra', 'embeddings', 'LayerNorm', 'gamma'] electra/embeddings/LayerNorm/gamma Initialize PyTorch weight ['electra', 'embeddings', 'position_embeddings'] electra/embeddings/position_embeddings Initialize PyTorch weight ['electra', 'embeddings', 'token_type_embeddings'] electra/embeddings/token_type_embeddings Initialize PyTorch weight ['electra', 'embeddings', 'word_embeddings'] electra/embeddings/word_embeddings Initialize PyTorch weight ['electra', 'embeddings_project', 'bias'] electra/embeddings_project/bias Initialize PyTorch weight ['electra', 'embeddings_project', 'kernel'] electra/embeddings_project/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_0/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_0/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_0/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_0/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_0/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_0/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_0/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_0/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_0/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_0/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'intermediate', 'dense', 'bias'] electra/encoder/layer_0/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_0/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_0/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_0/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'output', 'dense', 'bias'] electra/encoder/layer_0/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'output', 'dense', 'kernel'] electra/encoder/layer_0/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_1/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_1/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_1/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_1/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_1/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_1/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_1/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_1/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_1/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_1/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'intermediate', 'dense', 'bias'] electra/encoder/layer_1/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_1/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_1/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_1/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'output', 'dense', 'bias'] electra/encoder/layer_1/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'output', 'dense', 'kernel'] electra/encoder/layer_1/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_10/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_10/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_10/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_10/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_10/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_10/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_10/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_10/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_10/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_10/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'intermediate', 'dense', 'bias'] electra/encoder/layer_10/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_10/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_10/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_10/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'output', 'dense', 'bias'] electra/encoder/layer_10/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'output', 'dense', 'kernel'] electra/encoder/layer_10/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_11/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_11/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_11/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_11/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_11/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_11/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_11/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_11/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_11/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_11/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'intermediate', 'dense', 'bias'] electra/encoder/layer_11/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_11/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_11/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_11/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'output', 'dense', 'bias'] electra/encoder/layer_11/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'output', 'dense', 'kernel'] electra/encoder/layer_11/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_2/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_2/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_2/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_2/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_2/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_2/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_2/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_2/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_2/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_2/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'intermediate', 'dense', 'bias'] electra/encoder/layer_2/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_2/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_2/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_2/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'output', 'dense', 'bias'] electra/encoder/layer_2/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'output', 'dense', 'kernel'] electra/encoder/layer_2/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_3/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_3/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_3/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_3/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_3/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_3/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_3/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_3/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_3/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_3/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'intermediate', 'dense', 'bias'] electra/encoder/layer_3/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_3/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_3/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_3/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'output', 'dense', 'bias'] electra/encoder/layer_3/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'output', 'dense', 'kernel'] electra/encoder/layer_3/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_4/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_4/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_4/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_4/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_4/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_4/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_4/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_4/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_4/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_4/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'intermediate', 'dense', 'bias'] electra/encoder/layer_4/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_4/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_4/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_4/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'output', 'dense', 'bias'] electra/encoder/layer_4/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'output', 'dense', 'kernel'] electra/encoder/layer_4/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_5/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_5/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_5/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_5/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_5/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_5/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_5/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_5/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_5/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_5/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'intermediate', 'dense', 'bias'] electra/encoder/layer_5/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_5/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_5/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_5/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'output', 'dense', 'bias'] electra/encoder/layer_5/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'output', 'dense', 'kernel'] electra/encoder/layer_5/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_6/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_6/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_6/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_6/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_6/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_6/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_6/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_6/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_6/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_6/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'intermediate', 'dense', 'bias'] electra/encoder/layer_6/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_6/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_6/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_6/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'output', 'dense', 'bias'] electra/encoder/layer_6/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'output', 'dense', 'kernel'] electra/encoder/layer_6/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_7/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_7/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_7/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_7/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_7/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_7/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_7/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_7/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_7/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_7/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'intermediate', 'dense', 'bias'] electra/encoder/layer_7/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_7/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_7/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_7/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'output', 'dense', 'bias'] electra/encoder/layer_7/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'output', 'dense', 'kernel'] electra/encoder/layer_7/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_8/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_8/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_8/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_8/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_8/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_8/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_8/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_8/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_8/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_8/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'intermediate', 'dense', 'bias'] electra/encoder/layer_8/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_8/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_8/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_8/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'output', 'dense', 'bias'] electra/encoder/layer_8/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'output', 'dense', 'kernel'] electra/encoder/layer_8/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_9/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_9/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_9/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_9/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_9/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_9/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_9/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_9/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_9/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_9/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'intermediate', 'dense', 'bias'] electra/encoder/layer_9/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_9/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_9/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_9/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'output', 'dense', 'bias'] electra/encoder/layer_9/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'output', 'dense', 'kernel'] electra/encoder/layer_9/output/dense/kernel Skipping generator/embeddings_project/bias ['generator', 'embeddings_project', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/embeddings_project/kernel ['generator', 'embeddings_project', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_0', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_0', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/output/dense/bias ['generator', 'encoder', 'layer_0', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/output/dense/kernel ['generator', 'encoder', 'layer_0', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/key/bias ['generator', 'encoder', 'layer_0', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/key/kernel ['generator', 'encoder', 'layer_0', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/query/bias ['generator', 'encoder', 'layer_0', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/query/kernel ['generator', 'encoder', 'layer_0', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/value/bias ['generator', 'encoder', 'layer_0', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/value/kernel ['generator', 'encoder', 'layer_0', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/intermediate/dense/bias ['generator', 'encoder', 'layer_0', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/intermediate/dense/kernel ['generator', 'encoder', 'layer_0', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/output/LayerNorm/beta ['generator', 'encoder', 'layer_0', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/output/LayerNorm/gamma ['generator', 'encoder', 'layer_0', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/output/dense/bias ['generator', 'encoder', 'layer_0', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/output/dense/kernel ['generator', 'encoder', 'layer_0', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_1', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_1', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/output/dense/bias ['generator', 'encoder', 'layer_1', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/output/dense/kernel ['generator', 'encoder', 'layer_1', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/key/bias ['generator', 'encoder', 'layer_1', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/key/kernel ['generator', 'encoder', 'layer_1', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/query/bias ['generator', 'encoder', 'layer_1', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/query/kernel ['generator', 'encoder', 'layer_1', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/value/bias ['generator', 'encoder', 'layer_1', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/value/kernel ['generator', 'encoder', 'layer_1', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/intermediate/dense/bias ['generator', 'encoder', 'layer_1', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/intermediate/dense/kernel ['generator', 'encoder', 'layer_1', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/output/LayerNorm/beta ['generator', 'encoder', 'layer_1', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/output/LayerNorm/gamma ['generator', 'encoder', 'layer_1', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/output/dense/bias ['generator', 'encoder', 'layer_1', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/output/dense/kernel ['generator', 'encoder', 'layer_1', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_10', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_10', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/output/dense/bias ['generator', 'encoder', 'layer_10', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/output/dense/kernel ['generator', 'encoder', 'layer_10', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/key/bias ['generator', 'encoder', 'layer_10', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/key/kernel ['generator', 'encoder', 'layer_10', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/query/bias ['generator', 'encoder', 'layer_10', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/query/kernel ['generator', 'encoder', 'layer_10', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/value/bias ['generator', 'encoder', 'layer_10', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/value/kernel ['generator', 'encoder', 'layer_10', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/intermediate/dense/bias ['generator', 'encoder', 'layer_10', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/intermediate/dense/kernel ['generator', 'encoder', 'layer_10', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/output/LayerNorm/beta ['generator', 'encoder', 'layer_10', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/output/LayerNorm/gamma ['generator', 'encoder', 'layer_10', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/output/dense/bias ['generator', 'encoder', 'layer_10', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/output/dense/kernel ['generator', 'encoder', 'layer_10', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_11', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_11', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/output/dense/bias ['generator', 'encoder', 'layer_11', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/output/dense/kernel ['generator', 'encoder', 'layer_11', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/key/bias ['generator', 'encoder', 'layer_11', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/key/kernel ['generator', 'encoder', 'layer_11', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/query/bias ['generator', 'encoder', 'layer_11', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/query/kernel ['generator', 'encoder', 'layer_11', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/value/bias ['generator', 'encoder', 'layer_11', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/value/kernel ['generator', 'encoder', 'layer_11', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/intermediate/dense/bias ['generator', 'encoder', 'layer_11', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/intermediate/dense/kernel ['generator', 'encoder', 'layer_11', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/output/LayerNorm/beta ['generator', 'encoder', 'layer_11', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/output/LayerNorm/gamma ['generator', 'encoder', 'layer_11', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/output/dense/bias ['generator', 'encoder', 'layer_11', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/output/dense/kernel ['generator', 'encoder', 'layer_11', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_2', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_2', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/output/dense/bias ['generator', 'encoder', 'layer_2', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/output/dense/kernel ['generator', 'encoder', 'layer_2', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/key/bias ['generator', 'encoder', 'layer_2', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/key/kernel ['generator', 'encoder', 'layer_2', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/query/bias ['generator', 'encoder', 'layer_2', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/query/kernel ['generator', 'encoder', 'layer_2', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/value/bias ['generator', 'encoder', 'layer_2', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/value/kernel ['generator', 'encoder', 'layer_2', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/intermediate/dense/bias ['generator', 'encoder', 'layer_2', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/intermediate/dense/kernel ['generator', 'encoder', 'layer_2', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/output/LayerNorm/beta ['generator', 'encoder', 'layer_2', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/output/LayerNorm/gamma ['generator', 'encoder', 'layer_2', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/output/dense/bias ['generator', 'encoder', 'layer_2', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/output/dense/kernel ['generator', 'encoder', 'layer_2', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_3', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_3', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/output/dense/bias ['generator', 'encoder', 'layer_3', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/output/dense/kernel ['generator', 'encoder', 'layer_3', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/key/bias ['generator', 'encoder', 'layer_3', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/key/kernel ['generator', 'encoder', 'layer_3', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/query/bias ['generator', 'encoder', 'layer_3', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/query/kernel ['generator', 'encoder', 'layer_3', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/value/bias ['generator', 'encoder', 'layer_3', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/value/kernel ['generator', 'encoder', 'layer_3', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/intermediate/dense/bias ['generator', 'encoder', 'layer_3', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/intermediate/dense/kernel ['generator', 'encoder', 'layer_3', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/output/LayerNorm/beta ['generator', 'encoder', 'layer_3', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/output/LayerNorm/gamma ['generator', 'encoder', 'layer_3', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/output/dense/bias ['generator', 'encoder', 'layer_3', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/output/dense/kernel ['generator', 'encoder', 'layer_3', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_4', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_4', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/output/dense/bias ['generator', 'encoder', 'layer_4', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/output/dense/kernel ['generator', 'encoder', 'layer_4', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/key/bias ['generator', 'encoder', 'layer_4', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/key/kernel ['generator', 'encoder', 'layer_4', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/query/bias ['generator', 'encoder', 'layer_4', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/query/kernel ['generator', 'encoder', 'layer_4', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/value/bias ['generator', 'encoder', 'layer_4', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/value/kernel ['generator', 'encoder', 'layer_4', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/intermediate/dense/bias ['generator', 'encoder', 'layer_4', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/intermediate/dense/kernel ['generator', 'encoder', 'layer_4', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/output/LayerNorm/beta ['generator', 'encoder', 'layer_4', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/output/LayerNorm/gamma ['generator', 'encoder', 'layer_4', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/output/dense/bias ['generator', 'encoder', 'layer_4', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/output/dense/kernel ['generator', 'encoder', 'layer_4', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_5', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_5', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/output/dense/bias ['generator', 'encoder', 'layer_5', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/output/dense/kernel ['generator', 'encoder', 'layer_5', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/key/bias ['generator', 'encoder', 'layer_5', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/key/kernel ['generator', 'encoder', 'layer_5', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/query/bias ['generator', 'encoder', 'layer_5', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/query/kernel ['generator', 'encoder', 'layer_5', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/value/bias ['generator', 'encoder', 'layer_5', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/value/kernel ['generator', 'encoder', 'layer_5', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/intermediate/dense/bias ['generator', 'encoder', 'layer_5', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/intermediate/dense/kernel ['generator', 'encoder', 'layer_5', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/output/LayerNorm/beta ['generator', 'encoder', 'layer_5', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/output/LayerNorm/gamma ['generator', 'encoder', 'layer_5', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/output/dense/bias ['generator', 'encoder', 'layer_5', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/output/dense/kernel ['generator', 'encoder', 'layer_5', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_6', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_6', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/output/dense/bias ['generator', 'encoder', 'layer_6', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/output/dense/kernel ['generator', 'encoder', 'layer_6', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/key/bias ['generator', 'encoder', 'layer_6', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/key/kernel ['generator', 'encoder', 'layer_6', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/query/bias ['generator', 'encoder', 'layer_6', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/query/kernel ['generator', 'encoder', 'layer_6', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/value/bias ['generator', 'encoder', 'layer_6', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/value/kernel ['generator', 'encoder', 'layer_6', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/intermediate/dense/bias ['generator', 'encoder', 'layer_6', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/intermediate/dense/kernel ['generator', 'encoder', 'layer_6', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/output/LayerNorm/beta ['generator', 'encoder', 'layer_6', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/output/LayerNorm/gamma ['generator', 'encoder', 'layer_6', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/output/dense/bias ['generator', 'encoder', 'layer_6', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/output/dense/kernel ['generator', 'encoder', 'layer_6', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_7', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_7', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/output/dense/bias ['generator', 'encoder', 'layer_7', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/output/dense/kernel ['generator', 'encoder', 'layer_7', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/key/bias ['generator', 'encoder', 'layer_7', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/key/kernel ['generator', 'encoder', 'layer_7', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/query/bias ['generator', 'encoder', 'layer_7', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/query/kernel ['generator', 'encoder', 'layer_7', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/value/bias ['generator', 'encoder', 'layer_7', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/value/kernel ['generator', 'encoder', 'layer_7', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/intermediate/dense/bias ['generator', 'encoder', 'layer_7', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/intermediate/dense/kernel ['generator', 'encoder', 'layer_7', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/output/LayerNorm/beta ['generator', 'encoder', 'layer_7', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/output/LayerNorm/gamma ['generator', 'encoder', 'layer_7', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/output/dense/bias ['generator', 'encoder', 'layer_7', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/output/dense/kernel ['generator', 'encoder', 'layer_7', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_8', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_8', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/output/dense/bias ['generator', 'encoder', 'layer_8', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/output/dense/kernel ['generator', 'encoder', 'layer_8', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/key/bias ['generator', 'encoder', 'layer_8', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/key/kernel ['generator', 'encoder', 'layer_8', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/query/bias ['generator', 'encoder', 'layer_8', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/query/kernel ['generator', 'encoder', 'layer_8', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/value/bias ['generator', 'encoder', 'layer_8', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/value/kernel ['generator', 'encoder', 'layer_8', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/intermediate/dense/bias ['generator', 'encoder', 'layer_8', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/intermediate/dense/kernel ['generator', 'encoder', 'layer_8', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/output/LayerNorm/beta ['generator', 'encoder', 'layer_8', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/output/LayerNorm/gamma ['generator', 'encoder', 'layer_8', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/output/dense/bias ['generator', 'encoder', 'layer_8', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/output/dense/kernel ['generator', 'encoder', 'layer_8', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_9', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_9', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/output/dense/bias ['generator', 'encoder', 'layer_9', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/output/dense/kernel ['generator', 'encoder', 'layer_9', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/key/bias ['generator', 'encoder', 'layer_9', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/key/kernel ['generator', 'encoder', 'layer_9', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/query/bias ['generator', 'encoder', 'layer_9', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/query/kernel ['generator', 'encoder', 'layer_9', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/value/bias ['generator', 'encoder', 'layer_9', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/value/kernel ['generator', 'encoder', 'layer_9', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/intermediate/dense/bias ['generator', 'encoder', 'layer_9', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/intermediate/dense/kernel ['generator', 'encoder', 'layer_9', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/output/LayerNorm/beta ['generator', 'encoder', 'layer_9', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/output/LayerNorm/gamma ['generator', 'encoder', 'layer_9', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/output/dense/bias ['generator', 'encoder', 'layer_9', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/output/dense/kernel ['generator', 'encoder', 'layer_9', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator_predictions/LayerNorm/beta ['generator_predictions', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator_predictions' Skipping generator_predictions/LayerNorm/gamma ['generator_predictions', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator_predictions' Skipping generator_predictions/dense/bias ['generator_predictions', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator_predictions' Skipping generator_predictions/dense/kernel ['generator_predictions', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator_predictions' Skipping generator_predictions/output_bias ['generator_lm_head', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator_lm_head'

It seems like OK since the Google's checkpoint consists of both generator and discriminator. However, as soon as I try to make some prediction (e.g. "I love reading [MASK]."), the top-5 most likely words is:

which is pretty random, I guess.

On the other hand, as soon as I initialise the ElectraForMaskedLM model directly from https://huggingface.co/google/electra-small-generator , everything works fantastically!

So my hypothesis is, that there is a bug in checkpoint translation to HF format. Can anybody tell me how I can load my own checkpoint (or at least that Google's to check if the whole thing works correctly)?

To reproduce

Steps to reproduce the behavior:

  1. Download the official ELECTRA-small checkpoint
  2. Try to run CLI script to convert the TF checkpoint to HF .bin model
  3. Run classical prediction and see top-5 words (OR import HF pipeline and run it in "fill-mask" mode)

You will see that the model from HF web works correctly whereas the model from Google's GitHub gives random tokens.

Expected behavior

I expected to see that the model is capable of making basic predictions, so that I know that it has been restored and reformatted correctly.

nemani commented 3 years ago

Oh my god I love you! I was just about to run experiments on why my converted model is not working using hugging-face transformers, and was worried that there was something wrong with my model and I will have to train it again! Thanks for posting this, know I am sure the fault is of the conversion script!!! I would suggest we open an issue on the transformers repository regarding this. Thanks!!

stefan-it commented 3 years ago

I've seen this error before and for my models I had to change the config.json:

https://s3.amazonaws.com/models.huggingface.co/bert/dbmdz/electra-small-turkish-cased-generator/config.json

You need to set "intermediate_size": 256 instead of "intermediate_size": 1024 :)

@LysandreJik gave me that hint :hugs:

nemani commented 3 years ago

I got excited too soon. My models are not working even with this repositories fine tuning code. I tried it on multiple classification tasks, but my model just predicts the same label again and again. Anyone faced this or similar issue?

LysandreJik commented 3 years ago

This issue was answered on the transformers repo: https://github.com/huggingface/transformers/issues/6945