allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.77k stars 2.25k forks source link

Predict doesn't work for Roberta MNLI on 1.0.0rc5 #4331

Closed schmmd closed 4 years ago

schmmd commented 4 years ago
$ echo '{"hypothesis": "Two women are sitting on a blanket near some rocks talking about politics.", "premise": "Two women are wandering along the shore drinking iced tea."}' | allennlp predict --predictor textual-entailment https://storage.googleapis.com/allennlp-public-models/mnli-roberta-large-2020.05.13.tar.gz -
2020-06-05 15:53:08,894 - INFO - transformers.file_utils - PyTorch version 1.5.0 available.
2020-06-05 15:53:09,595 - INFO - allennlp.models.archival - loading archive file https://storage.googleapis.com/allennlp-public-models/mnli-roberta-large-2020.05.13.tar.gz from cache at /home/michaels/.allennlp/cache/6464891350f0d2a96fed729f770a3c95cfdcdfac243c7a016377c0ded406e599.128e3323e8512d3bdda5113ef51fa10fa25624d0213c5c73445832a2f73916f3
2020-06-05 15:53:09,596 - INFO - allennlp.models.archival - extracting archive file /home/michaels/.allennlp/cache/6464891350f0d2a96fed729f770a3c95cfdcdfac243c7a016377c0ded406e599.128e3323e8512d3bdda5113ef51fa10fa25624d0213c5c73445832a2f73916f3 to temp dir /tmp/tmp20uyprv6
2020-06-05 15:53:18,719 - INFO - allennlp.common.params - type = from_instances
2020-06-05 15:53:18,719 - INFO - allennlp.data.vocabulary - Loading token dictionary from /tmp/tmp20uyprv6/vocabulary.
2020-06-05 15:53:18,720 - INFO - allennlp.common.params - model.type = basic_classifier
2020-06-05 15:53:18,720 - INFO - allennlp.common.params - model.regularizer = None
2020-06-05 15:53:18,720 - INFO - allennlp.common.params - model.text_field_embedder.type = basic
2020-06-05 15:53:18,720 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.tokens.type = pretrained_transformer
2020-06-05 15:53:18,721 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.tokens.model_name = roberta-large
2020-06-05 15:53:18,721 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.tokens.max_length = 512
2020-06-05 15:53:19,042 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-05 15:53:19,043 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-05 15:53:19,625 - INFO - transformers.modeling_utils - loading weights file https://cdn.huggingface.co/roberta-large-pytorch_model.bin from cache at /home/michaels/.cache/torch/transformers/2339ac1858323405dffff5156947669fed6f63a0c34cfab35bda4f78791893d2.fc7abf72755ecc4a75d0d336a93c1c63358d2334f5998ed326f3b0da380bf536
2020-06-05 15:53:29,290 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-05 15:53:29,291 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-05 15:53:29,904 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-05 15:53:29,905 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-05 15:53:30,290 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-05 15:53:30,291 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-05 15:53:30,915 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-05 15:53:30,916 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-05 15:53:31,005 - INFO - allennlp.common.params - model.seq2vec_encoder.type = cls_pooler
2020-06-05 15:53:31,005 - INFO - allennlp.common.params - model.seq2vec_encoder.embedding_dim = 1024
2020-06-05 15:53:31,005 - INFO - allennlp.common.params - model.seq2vec_encoder.cls_is_last_token = False
2020-06-05 15:53:31,005 - INFO - allennlp.common.params - model.seq2seq_encoder = None
2020-06-05 15:53:31,005 - INFO - allennlp.common.params - model.feedforward.input_dim = 1024
2020-06-05 15:53:31,005 - INFO - allennlp.common.params - model.feedforward.num_layers = 1
2020-06-05 15:53:31,005 - INFO - allennlp.common.params - model.feedforward.hidden_dims = 1024
2020-06-05 15:53:31,006 - INFO - allennlp.common.params - model.feedforward.activations = tanh
2020-06-05 15:53:31,006 - INFO - allennlp.common.params - type = tanh
2020-06-05 15:53:31,006 - INFO - allennlp.common.params - model.feedforward.dropout = 0.0
2020-06-05 15:53:31,012 - INFO - allennlp.common.params - model.dropout = 0.1
2020-06-05 15:53:31,012 - INFO - allennlp.common.params - model.num_labels = None
2020-06-05 15:53:31,012 - INFO - allennlp.common.params - model.label_namespace = labels
2020-06-05 15:53:31,012 - INFO - allennlp.common.params - model.namespace = tags
2020-06-05 15:53:31,012 - INFO - allennlp.common.params - model.initializer = <allennlp.nn.initializers.InitializerApplicator object at 0x7fa4abe41f50>
2020-06-05 15:53:31,012 - INFO - allennlp.nn.initializers - Initializing parameters
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers - Done initializing parameters; the following parameters are using their default initialization from their code
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _classification_layer.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _classification_layer.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _feedforward._linear_layers.0.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _feedforward._linear_layers.0.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.LayerNorm.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.LayerNorm.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.position_embeddings.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.token_type_embeddings.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.word_embeddings.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.LayerNorm.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.LayerNorm.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.dense.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.dense.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.key.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.key.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.query.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.query.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.value.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.value.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.intermediate.dense.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.intermediate.dense.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.LayerNorm.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.LayerNorm.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.dense.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.dense.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.LayerNorm.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.LayerNorm.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.dense.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.dense.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.key.bias
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.key.weight
2020-06-05 15:53:31,014 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.query.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.query.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.value.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.value.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.intermediate.dense.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.intermediate.dense.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.LayerNorm.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.LayerNorm.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.dense.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.dense.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.LayerNorm.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.LayerNorm.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.dense.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.dense.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.key.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.key.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.query.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.query.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.value.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.value.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.intermediate.dense.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.intermediate.dense.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.LayerNorm.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.LayerNorm.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.dense.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.dense.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.LayerNorm.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.LayerNorm.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.dense.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.dense.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.key.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.key.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.query.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.query.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.value.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.value.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.intermediate.dense.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.intermediate.dense.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.LayerNorm.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.LayerNorm.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.dense.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.dense.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.LayerNorm.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.LayerNorm.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.dense.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.dense.weight
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.key.bias
2020-06-05 15:53:31,015 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.key.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.query.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.query.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.value.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.value.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.intermediate.dense.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.intermediate.dense.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.LayerNorm.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.LayerNorm.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.dense.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.dense.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.LayerNorm.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.LayerNorm.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.dense.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.dense.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.key.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.key.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.query.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.query.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.value.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.value.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.intermediate.dense.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.intermediate.dense.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.LayerNorm.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.LayerNorm.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.dense.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.dense.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.LayerNorm.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.LayerNorm.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.dense.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.dense.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.key.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.key.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.query.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.query.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.value.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.value.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.intermediate.dense.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.intermediate.dense.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.LayerNorm.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.LayerNorm.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.dense.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.dense.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.LayerNorm.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.LayerNorm.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.dense.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.dense.weight
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.key.bias
2020-06-05 15:53:31,016 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.key.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.query.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.query.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.value.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.value.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.intermediate.dense.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.intermediate.dense.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.LayerNorm.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.LayerNorm.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.dense.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.dense.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.LayerNorm.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.LayerNorm.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.dense.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.dense.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.key.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.key.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.query.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.query.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.value.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.value.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.intermediate.dense.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.intermediate.dense.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.LayerNorm.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.LayerNorm.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.dense.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.dense.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.LayerNorm.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.LayerNorm.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.dense.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.dense.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.key.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.key.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.query.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.query.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.value.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.value.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.intermediate.dense.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.intermediate.dense.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.LayerNorm.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.LayerNorm.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.dense.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.dense.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.LayerNorm.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.LayerNorm.weight
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.dense.bias
2020-06-05 15:53:31,017 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.dense.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.key.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.key.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.query.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.query.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.value.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.value.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.intermediate.dense.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.intermediate.dense.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.LayerNorm.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.LayerNorm.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.dense.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.dense.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.LayerNorm.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.LayerNorm.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.dense.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.dense.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.key.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.key.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.query.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.query.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.value.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.value.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.intermediate.dense.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.intermediate.dense.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.LayerNorm.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.LayerNorm.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.dense.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.dense.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.LayerNorm.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.LayerNorm.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.dense.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.dense.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.key.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.key.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.query.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.query.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.value.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.value.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.intermediate.dense.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.intermediate.dense.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.LayerNorm.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.LayerNorm.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.dense.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.dense.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.LayerNorm.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.LayerNorm.weight
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.dense.bias
2020-06-05 15:53:31,018 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.dense.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.key.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.key.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.query.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.query.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.value.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.value.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.intermediate.dense.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.intermediate.dense.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.LayerNorm.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.LayerNorm.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.dense.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.dense.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.LayerNorm.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.LayerNorm.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.dense.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.dense.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.key.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.key.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.query.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.query.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.value.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.value.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.intermediate.dense.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.intermediate.dense.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.LayerNorm.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.LayerNorm.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.dense.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.dense.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.LayerNorm.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.LayerNorm.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.dense.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.dense.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.key.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.key.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.query.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.query.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.value.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.value.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.intermediate.dense.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.intermediate.dense.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.LayerNorm.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.LayerNorm.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.dense.bias
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.dense.weight
2020-06-05 15:53:31,019 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.LayerNorm.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.LayerNorm.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.dense.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.dense.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.key.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.key.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.query.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.query.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.value.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.value.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.intermediate.dense.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.intermediate.dense.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.LayerNorm.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.LayerNorm.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.dense.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.dense.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.LayerNorm.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.LayerNorm.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.dense.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.dense.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.key.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.key.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.query.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.query.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.value.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.value.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.intermediate.dense.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.intermediate.dense.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.LayerNorm.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.LayerNorm.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.dense.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.dense.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.LayerNorm.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.LayerNorm.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.dense.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.dense.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.key.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.key.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.query.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.query.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.value.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.value.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.intermediate.dense.bias
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.intermediate.dense.weight
2020-06-05 15:53:31,020 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.LayerNorm.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.LayerNorm.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.dense.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.dense.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.LayerNorm.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.LayerNorm.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.dense.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.dense.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.key.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.key.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.query.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.query.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.value.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.value.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.intermediate.dense.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.intermediate.dense.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.LayerNorm.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.LayerNorm.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.dense.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.dense.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.LayerNorm.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.LayerNorm.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.dense.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.dense.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.key.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.key.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.query.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.query.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.value.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.value.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.intermediate.dense.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.intermediate.dense.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.LayerNorm.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.LayerNorm.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.dense.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.dense.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.LayerNorm.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.LayerNorm.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.dense.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.dense.weight
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.key.bias
2020-06-05 15:53:31,021 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.key.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.query.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.query.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.value.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.value.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.intermediate.dense.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.intermediate.dense.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.LayerNorm.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.LayerNorm.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.dense.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.dense.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.LayerNorm.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.LayerNorm.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.dense.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.dense.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.key.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.key.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.query.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.query.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.value.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.value.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.intermediate.dense.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.intermediate.dense.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.LayerNorm.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.LayerNorm.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.dense.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.dense.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.LayerNorm.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.LayerNorm.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.dense.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.dense.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.key.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.key.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.query.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.query.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.value.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.value.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.intermediate.dense.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.intermediate.dense.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.LayerNorm.bias
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.LayerNorm.weight
2020-06-05 15:53:31,022 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.dense.bias
2020-06-05 15:53:31,023 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.dense.weight
2020-06-05 15:53:31,023 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.pooler.dense.bias
2020-06-05 15:53:31,023 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.pooler.dense.weight
2020-06-05 15:53:32,133 - INFO - allennlp.common.params - dataset_reader.type = snli
2020-06-05 15:53:32,134 - INFO - allennlp.common.params - dataset_reader.lazy = False
2020-06-05 15:53:32,134 - INFO - allennlp.common.params - dataset_reader.cache_directory = None
2020-06-05 15:53:32,134 - INFO - allennlp.common.params - dataset_reader.max_instances = None
2020-06-05 15:53:32,134 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2020-06-05 15:53:32,134 - INFO - allennlp.common.params - dataset_reader.tokenizer.type = pretrained_transformer
2020-06-05 15:53:32,134 - INFO - allennlp.common.params - dataset_reader.tokenizer.model_name = roberta-large
2020-06-05 15:53:32,134 - INFO - allennlp.common.params - dataset_reader.tokenizer.add_special_tokens = True
2020-06-05 15:53:32,134 - INFO - allennlp.common.params - dataset_reader.tokenizer.max_length = None
2020-06-05 15:53:32,134 - INFO - allennlp.common.params - dataset_reader.tokenizer.stride = 0
2020-06-05 15:53:32,134 - INFO - allennlp.common.params - dataset_reader.tokenizer.truncation_strategy = longest_first
2020-06-05 15:53:32,134 - INFO - allennlp.common.params - dataset_reader.tokenizer.tokenizer_kwargs = None
2020-06-05 15:53:32,437 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-05 15:53:32,438 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-05 15:53:33,058 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-05 15:53:33,058 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-05 15:53:33,440 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-05 15:53:33,441 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-05 15:53:34,073 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-05 15:53:34,074 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-05 15:53:34,153 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.type = pretrained_transformer
2020-06-05 15:53:34,154 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2020-06-05 15:53:34,154 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.model_name = roberta-large
2020-06-05 15:53:34,154 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.namespace = tags
2020-06-05 15:53:34,154 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.max_length = 512
2020-06-05 15:53:35,462 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-05 15:53:35,463 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-05 15:53:36,086 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-05 15:53:36,087 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-05 15:53:36,505 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-05 15:53:36,506 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-05 15:53:37,181 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-05 15:53:37,181 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-05 15:53:37,310 - INFO - allennlp.common.params - dataset_reader.combine_input_fields = None
Traceback (most recent call last):
  File "/home/michaels/miniconda2/envs/allennlp-rc5/bin/allennlp", line 8, in <module>
    sys.exit(run())
  File "/home/michaels/miniconda2/envs/allennlp-rc5/lib/python3.7/site-packages/allennlp/__main__.py", line 19, in run
    main(prog="allennlp")
  File "/home/michaels/miniconda2/envs/allennlp-rc5/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 92, in main
    args.func(args)
  File "/home/michaels/miniconda2/envs/allennlp-rc5/lib/python3.7/site-packages/allennlp/commands/predict.py", line 197, in _predict
    predictor = _get_predictor(args)
  File "/home/michaels/miniconda2/envs/allennlp-rc5/lib/python3.7/site-packages/allennlp/commands/predict.py", line 102, in _get_predictor
    archive, args.predictor, dataset_reader_to_load=args.dataset_reader_choice
  File "/home/michaels/miniconda2/envs/allennlp-rc5/lib/python3.7/site-packages/allennlp/predictors/predictor.py", line 294, in from_archive
    dataset_reader = DatasetReader.from_params(dataset_reader_params)
  File "/home/michaels/miniconda2/envs/allennlp-rc5/lib/python3.7/site-packages/allennlp/common/from_params.py", line 580, in from_params
    **extras,
  File "/home/michaels/miniconda2/envs/allennlp-rc5/lib/python3.7/site-packages/allennlp/common/from_params.py", line 611, in from_params
    return constructor_to_call(**kwargs)  # type: ignore
  File "/home/michaels/miniconda2/envs/allennlp-rc5/lib/python3.7/site-packages/allennlp_models/pair_classification/dataset_readers/snli.py", line 51, in __init__
    assert not self._tokenizer._add_special_tokens
AssertionError
2020-06-05 15:53:37,312 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmp20uyprv6
dirkgr commented 4 years ago

I know what this is, but to be sure that the whole model still works, I'm retraining it. It should be done before any of y'all wake up.

dirkgr commented 4 years ago

https://github.com/allenai/allennlp-models/pull/72 is needed for retraining.

schmmd commented 4 years ago

RC5

$ echo '{"hypothesis": "Two women are sitting on a blanket near some rocks talking about politics.", "premise": "Two women are wandering along the shore drinking iced tea."}' | allennlp predict --predictor textual-entailment https://storage.googleapis.com/allennlp-public-models/mnli_roberta-2020.06.09.tar.gz -
2020-06-10 11:20:10,831 - INFO - transformers.file_utils - PyTorch version 1.5.0 available.
2020-06-10 11:20:11,516 - INFO - allennlp.models.archival - loading archive file https://storage.googleapis.com/allennlp-public-models/mnli_roberta-2020.06.09.tar.gz from cache at /home/michaels/.allennlp/cache/a80e50c38b14f28bf5423cef1d8d5e19e9712227b50a4389c30474e8dcb7166f.cbb0061a8d069270fa326bb9a72ad01b7f2f9d9a0017928c64b71be3ddf578c1
2020-06-10 11:20:11,518 - INFO - allennlp.models.archival - extracting archive file /home/michaels/.allennlp/cache/a80e50c38b14f28bf5423cef1d8d5e19e9712227b50a4389c30474e8dcb7166f.cbb0061a8d069270fa326bb9a72ad01b7f2f9d9a0017928c64b71be3ddf578c1 to temp dir /tmp/tmpnk3qeff5
2020-06-10 11:20:20,844 - INFO - allennlp.common.params - type = from_instances
2020-06-10 11:20:20,844 - INFO - allennlp.data.vocabulary - Loading token dictionary from /tmp/tmpnk3qeff5/vocabulary.
2020-06-10 11:20:20,845 - INFO - allennlp.common.params - model.type = basic_classifier
2020-06-10 11:20:20,846 - INFO - allennlp.common.params - model.regularizer = None
2020-06-10 11:20:20,846 - INFO - allennlp.common.params - model.text_field_embedder.type = basic
2020-06-10 11:20:20,846 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.tokens.type = pretrained_transformer
2020-06-10 11:20:20,846 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.tokens.model_name = roberta-large
2020-06-10 11:20:20,846 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.tokens.max_length = 512
2020-06-10 11:20:21,179 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:21,180 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-10 11:20:21,465 - INFO - transformers.modeling_utils - loading weights file https://cdn.huggingface.co/roberta-large-pytorch_model.bin from cache at /home/michaels/.cache/torch/transformers/2339ac1858323405dffff5156947669fed6f63a0c34cfab35bda4f78791893d2.fc7abf72755ecc4a75d0d336a93c1c63358d2334f5998ed326f3b0da380bf536
2020-06-10 11:20:31,188 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:31,189 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-10 11:20:31,870 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:31,871 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:32,436 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:32,437 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-10 11:20:33,079 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:33,079 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:33,205 - INFO - allennlp.common.params - model.seq2vec_encoder.type = cls_pooler
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.seq2vec_encoder.embedding_dim = 1024
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.seq2vec_encoder.cls_is_last_token = False
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.seq2seq_encoder = None
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.feedforward.input_dim = 1024
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.feedforward.num_layers = 1
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.feedforward.hidden_dims = 1024
2020-06-10 11:20:33,207 - INFO - allennlp.common.params - model.feedforward.activations = tanh
2020-06-10 11:20:33,207 - INFO - allennlp.common.params - type = tanh
2020-06-10 11:20:33,207 - INFO - allennlp.common.params - model.feedforward.dropout = 0.0
2020-06-10 11:20:33,217 - INFO - allennlp.common.params - model.dropout = 0.1
2020-06-10 11:20:33,217 - INFO - allennlp.common.params - model.num_labels = None
2020-06-10 11:20:33,217 - INFO - allennlp.common.params - model.label_namespace = labels
2020-06-10 11:20:33,218 - INFO - allennlp.common.params - model.namespace = tags
2020-06-10 11:20:33,218 - INFO - allennlp.common.params - model.initializer = <allennlp.nn.initializers.InitializerApplicator object at 0x7f11834de490>
2020-06-10 11:20:33,218 - INFO - allennlp.nn.initializers - Initializing parameters
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - Done initializing parameters; the following parameters are using their default initialization from their code
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _classification_layer.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _classification_layer.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _feedforward._linear_layers.0.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _feedforward._linear_layers.0.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.LayerNorm.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.LayerNorm.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.position_embeddings.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.token_type_embeddings.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.word_embeddings.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.LayerNorm.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.LayerNorm.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.dense.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.dense.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.key.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.key.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.query.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.query.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.value.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.value.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.intermediate.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.intermediate.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.LayerNorm.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.LayerNorm.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.LayerNorm.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.LayerNorm.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.key.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.key.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.query.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.query.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.value.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.value.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.intermediate.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.intermediate.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.LayerNorm.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.LayerNorm.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.LayerNorm.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.LayerNorm.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.key.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.key.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.query.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.query.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.value.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.value.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.intermediate.dense.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.intermediate.dense.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.LayerNorm.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.LayerNorm.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.dense.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.dense.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.LayerNorm.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.LayerNorm.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.dense.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.dense.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.key.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.key.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.query.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.query.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.value.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.value.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.intermediate.dense.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.intermediate.dense.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.LayerNorm.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.LayerNorm.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.dense.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.dense.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.LayerNorm.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.LayerNorm.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.dense.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.key.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.key.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.query.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.query.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.value.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.value.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.intermediate.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.intermediate.dense.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.LayerNorm.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.LayerNorm.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.dense.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.LayerNorm.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.LayerNorm.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.dense.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.key.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.key.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.query.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.query.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.value.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.value.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.intermediate.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.intermediate.dense.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.LayerNorm.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.LayerNorm.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.dense.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.LayerNorm.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.LayerNorm.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.dense.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.dense.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.key.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.key.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.query.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.query.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.value.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.value.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.intermediate.dense.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.intermediate.dense.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.LayerNorm.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.LayerNorm.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.dense.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.dense.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.LayerNorm.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.LayerNorm.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.dense.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.dense.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.key.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.key.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.query.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.query.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.value.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.value.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.intermediate.dense.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.intermediate.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.LayerNorm.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.LayerNorm.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.dense.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.LayerNorm.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.LayerNorm.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.dense.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.key.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.key.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.query.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.query.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.value.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.value.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.intermediate.dense.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.intermediate.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.LayerNorm.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.LayerNorm.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.dense.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.LayerNorm.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.LayerNorm.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.dense.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.key.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.key.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.query.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.query.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.value.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.value.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.intermediate.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.intermediate.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.LayerNorm.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.LayerNorm.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.LayerNorm.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.LayerNorm.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.key.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.key.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.query.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.query.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.value.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.value.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.intermediate.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.intermediate.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.LayerNorm.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.LayerNorm.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.LayerNorm.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.LayerNorm.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.key.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.key.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.query.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.query.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.value.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.value.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.intermediate.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.intermediate.dense.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.LayerNorm.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.LayerNorm.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.dense.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.LayerNorm.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.LayerNorm.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.dense.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.key.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.key.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.query.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.query.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.value.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.value.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.intermediate.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.intermediate.dense.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.LayerNorm.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.LayerNorm.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.dense.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.LayerNorm.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.LayerNorm.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.key.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.key.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.query.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.query.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.value.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.value.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.intermediate.dense.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.intermediate.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.LayerNorm.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.LayerNorm.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.dense.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.LayerNorm.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.LayerNorm.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.dense.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.key.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.key.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.query.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.query.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.value.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.value.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.intermediate.dense.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.intermediate.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.LayerNorm.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.LayerNorm.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.dense.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.LayerNorm.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.LayerNorm.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.dense.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.dense.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.key.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.key.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.query.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.query.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.value.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.value.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.intermediate.dense.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.intermediate.dense.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.LayerNorm.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.LayerNorm.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.dense.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.dense.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.LayerNorm.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.LayerNorm.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.dense.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.dense.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.key.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.key.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.query.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.query.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.value.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.value.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.intermediate.dense.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.intermediate.dense.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.LayerNorm.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.LayerNorm.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.dense.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.LayerNorm.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.LayerNorm.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.dense.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.key.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.key.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.query.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.query.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.value.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.value.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.intermediate.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.intermediate.dense.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.LayerNorm.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.LayerNorm.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.dense.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.LayerNorm.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.LayerNorm.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.dense.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.key.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.key.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.query.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.query.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.value.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.value.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.intermediate.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.intermediate.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.LayerNorm.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.LayerNorm.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.dense.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.LayerNorm.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.LayerNorm.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.dense.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.key.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.key.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.query.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.query.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.value.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.value.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.intermediate.dense.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.intermediate.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.LayerNorm.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.LayerNorm.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.dense.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.LayerNorm.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.LayerNorm.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.dense.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.key.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.key.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.query.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.query.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.value.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.value.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.intermediate.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.intermediate.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.LayerNorm.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.LayerNorm.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.LayerNorm.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.LayerNorm.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.key.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.key.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.query.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.query.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.value.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.value.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.intermediate.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.intermediate.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.LayerNorm.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.LayerNorm.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.LayerNorm.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.LayerNorm.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.key.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.key.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.query.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.query.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.value.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.value.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.intermediate.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.intermediate.dense.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.LayerNorm.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.LayerNorm.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.dense.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.LayerNorm.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.LayerNorm.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.dense.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.key.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.key.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.query.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.query.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.value.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.value.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.intermediate.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.intermediate.dense.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.LayerNorm.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.LayerNorm.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.dense.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.pooler.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers -    _text_field_embedder.token_embedder_tokens.transformer_model.pooler.dense.weight
2020-06-10 11:20:34,211 - INFO - allennlp.common.params - dataset_reader.type = snli
2020-06-10 11:20:34,211 - INFO - allennlp.common.params - dataset_reader.lazy = False
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.cache_directory = None
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.max_instances = None
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.type = pretrained_transformer
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.model_name = roberta-large
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.add_special_tokens = False
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.max_length = None
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.stride = 0
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.truncation_strategy = longest_first
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.tokenizer_kwargs = None
2020-06-10 11:20:34,529 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:34,530 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-10 11:20:35,178 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:35,178 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:35,589 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:35,590 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-10 11:20:36,218 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:36,219 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:36,337 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.type = pretrained_transformer
2020-06-10 11:20:36,338 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2020-06-10 11:20:36,338 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.model_name = roberta-large
2020-06-10 11:20:36,338 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.namespace = tags
2020-06-10 11:20:36,338 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.max_length = 512
2020-06-10 11:20:36,653 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:36,654 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-10 11:20:37,398 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:37,399 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:37,833 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:37,834 - INFO - transformers.configuration_utils - Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

2020-06-10 11:20:38,678 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:38,679 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:38,806 - INFO - allennlp.common.params - dataset_reader.combine_input_fields = None
input 0:  {"hypothesis": "Two women are sitting on a blanket near some rocks talking about politics.", "premise": "Two women are wandering along the shore drinking iced tea."}
prediction:  {"logits": [-3.2302780151367188, 6.4810333251953125, -2.5054984092712402], "probs": [6.058295912225731e-05, 0.9998144507408142, 0.00012505988706834614], "token_ids": [0, 1596, 390, 32, 26884, 552, 5, 8373, 4835, 1437, 12646, 6845, 4, 2, 2, 1596, 390, 32, 2828, 15, 10, 14165, 583, 103, 10889, 1686, 59, 2302, 4, 2], "label": "contradiction", "tokens": ["<s>", "\u0120Two", "\u0120women", "\u0120are", "\u0120wandering", "\u0120along", "\u0120the", "\u0120shore", "\u0120drinking", "\u0120", "iced", "\u0120tea", ".", "</s>", "</s>", "\u0120Two", "\u0120women", "\u0120are", "\u0120sitting", "\u0120on", "\u0120a", "\u0120blanket", "\u0120near", "\u0120some", "\u0120rocks", "\u0120talking", "\u0120about", "\u0120politics", ".", "</s>"]}

2020-06-10 11:20:39,234 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpnk3qeff5
schmmd commented 4 years ago

@dirkgr do we want the /us in this output? E.g. \u0120Two

dirkgr commented 4 years ago

I don't know why the tokens are in the output at all. Maybe it's for interpret to work, or a visualization tool? If we have tokens in the output at all, they should have the \u0120. The tokens and token ids are a view into the internals of the model, and so is \u0120.

schmmd commented 4 years ago

Fixed in demo.

matt-gardner commented 4 years ago

I don't know why the tokens are in the output at all. Maybe it's for interpret to work, or a visualization tool?

Yes, it's so that interpretations can actually be understood correctly. We need to know the tokenization, which we can only get from the model.