Closed schmmd closed 4 years ago
I know what this is, but to be sure that the whole model still works, I'm retraining it. It should be done before any of y'all wake up.
https://github.com/allenai/allennlp-models/pull/72 is needed for retraining.
RC5
$ echo '{"hypothesis": "Two women are sitting on a blanket near some rocks talking about politics.", "premise": "Two women are wandering along the shore drinking iced tea."}' | allennlp predict --predictor textual-entailment https://storage.googleapis.com/allennlp-public-models/mnli_roberta-2020.06.09.tar.gz -
2020-06-10 11:20:10,831 - INFO - transformers.file_utils - PyTorch version 1.5.0 available.
2020-06-10 11:20:11,516 - INFO - allennlp.models.archival - loading archive file https://storage.googleapis.com/allennlp-public-models/mnli_roberta-2020.06.09.tar.gz from cache at /home/michaels/.allennlp/cache/a80e50c38b14f28bf5423cef1d8d5e19e9712227b50a4389c30474e8dcb7166f.cbb0061a8d069270fa326bb9a72ad01b7f2f9d9a0017928c64b71be3ddf578c1
2020-06-10 11:20:11,518 - INFO - allennlp.models.archival - extracting archive file /home/michaels/.allennlp/cache/a80e50c38b14f28bf5423cef1d8d5e19e9712227b50a4389c30474e8dcb7166f.cbb0061a8d069270fa326bb9a72ad01b7f2f9d9a0017928c64b71be3ddf578c1 to temp dir /tmp/tmpnk3qeff5
2020-06-10 11:20:20,844 - INFO - allennlp.common.params - type = from_instances
2020-06-10 11:20:20,844 - INFO - allennlp.data.vocabulary - Loading token dictionary from /tmp/tmpnk3qeff5/vocabulary.
2020-06-10 11:20:20,845 - INFO - allennlp.common.params - model.type = basic_classifier
2020-06-10 11:20:20,846 - INFO - allennlp.common.params - model.regularizer = None
2020-06-10 11:20:20,846 - INFO - allennlp.common.params - model.text_field_embedder.type = basic
2020-06-10 11:20:20,846 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.tokens.type = pretrained_transformer
2020-06-10 11:20:20,846 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.tokens.model_name = roberta-large
2020-06-10 11:20:20,846 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.tokens.max_length = 512
2020-06-10 11:20:21,179 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:21,180 - INFO - transformers.configuration_utils - Model config RobertaConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "roberta",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"pad_token_id": 1,
"type_vocab_size": 1,
"vocab_size": 50265
}
2020-06-10 11:20:21,465 - INFO - transformers.modeling_utils - loading weights file https://cdn.huggingface.co/roberta-large-pytorch_model.bin from cache at /home/michaels/.cache/torch/transformers/2339ac1858323405dffff5156947669fed6f63a0c34cfab35bda4f78791893d2.fc7abf72755ecc4a75d0d336a93c1c63358d2334f5998ed326f3b0da380bf536
2020-06-10 11:20:31,188 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:31,189 - INFO - transformers.configuration_utils - Model config RobertaConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "roberta",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"pad_token_id": 1,
"type_vocab_size": 1,
"vocab_size": 50265
}
2020-06-10 11:20:31,870 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:31,871 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:32,436 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:32,437 - INFO - transformers.configuration_utils - Model config RobertaConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "roberta",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"pad_token_id": 1,
"type_vocab_size": 1,
"vocab_size": 50265
}
2020-06-10 11:20:33,079 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:33,079 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:33,205 - INFO - allennlp.common.params - model.seq2vec_encoder.type = cls_pooler
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.seq2vec_encoder.embedding_dim = 1024
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.seq2vec_encoder.cls_is_last_token = False
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.seq2seq_encoder = None
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.feedforward.input_dim = 1024
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.feedforward.num_layers = 1
2020-06-10 11:20:33,206 - INFO - allennlp.common.params - model.feedforward.hidden_dims = 1024
2020-06-10 11:20:33,207 - INFO - allennlp.common.params - model.feedforward.activations = tanh
2020-06-10 11:20:33,207 - INFO - allennlp.common.params - type = tanh
2020-06-10 11:20:33,207 - INFO - allennlp.common.params - model.feedforward.dropout = 0.0
2020-06-10 11:20:33,217 - INFO - allennlp.common.params - model.dropout = 0.1
2020-06-10 11:20:33,217 - INFO - allennlp.common.params - model.num_labels = None
2020-06-10 11:20:33,217 - INFO - allennlp.common.params - model.label_namespace = labels
2020-06-10 11:20:33,218 - INFO - allennlp.common.params - model.namespace = tags
2020-06-10 11:20:33,218 - INFO - allennlp.common.params - model.initializer = <allennlp.nn.initializers.InitializerApplicator object at 0x7f11834de490>
2020-06-10 11:20:33,218 - INFO - allennlp.nn.initializers - Initializing parameters
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - Done initializing parameters; the following parameters are using their default initialization from their code
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _classification_layer.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _classification_layer.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _feedforward._linear_layers.0.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _feedforward._linear_layers.0.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.LayerNorm.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.LayerNorm.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.position_embeddings.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.token_type_embeddings.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.embeddings.word_embeddings.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.LayerNorm.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.LayerNorm.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.dense.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.output.dense.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.key.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.key.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.query.bias
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.query.weight
2020-06-10 11:20:33,220 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.value.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.attention.self.value.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.intermediate.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.intermediate.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.LayerNorm.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.LayerNorm.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.0.output.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.LayerNorm.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.LayerNorm.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.output.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.key.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.key.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.query.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.query.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.value.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.attention.self.value.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.intermediate.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.intermediate.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.LayerNorm.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.LayerNorm.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.1.output.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.LayerNorm.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.LayerNorm.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.dense.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.output.dense.weight
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.key.bias
2020-06-10 11:20:33,221 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.key.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.query.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.query.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.value.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.attention.self.value.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.intermediate.dense.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.intermediate.dense.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.LayerNorm.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.LayerNorm.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.dense.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.10.output.dense.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.LayerNorm.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.LayerNorm.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.dense.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.output.dense.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.key.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.key.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.query.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.query.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.value.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.attention.self.value.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.intermediate.dense.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.intermediate.dense.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.LayerNorm.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.LayerNorm.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.dense.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.11.output.dense.weight
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.LayerNorm.bias
2020-06-10 11:20:33,222 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.LayerNorm.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.output.dense.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.key.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.key.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.query.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.query.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.value.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.attention.self.value.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.intermediate.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.intermediate.dense.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.LayerNorm.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.LayerNorm.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.12.output.dense.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.LayerNorm.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.LayerNorm.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.output.dense.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.key.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.key.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.query.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.query.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.value.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.attention.self.value.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.intermediate.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.intermediate.dense.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.LayerNorm.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.LayerNorm.weight
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.dense.bias
2020-06-10 11:20:33,223 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.13.output.dense.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.LayerNorm.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.LayerNorm.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.dense.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.output.dense.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.key.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.key.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.query.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.query.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.value.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.attention.self.value.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.intermediate.dense.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.intermediate.dense.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.LayerNorm.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.LayerNorm.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.dense.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.14.output.dense.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.LayerNorm.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.LayerNorm.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.dense.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.output.dense.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.key.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.key.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.query.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.query.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.value.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.attention.self.value.weight
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.intermediate.dense.bias
2020-06-10 11:20:33,224 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.intermediate.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.LayerNorm.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.LayerNorm.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.dense.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.15.output.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.LayerNorm.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.LayerNorm.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.dense.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.output.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.key.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.key.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.query.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.query.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.value.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.attention.self.value.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.intermediate.dense.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.intermediate.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.LayerNorm.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.LayerNorm.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.dense.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.16.output.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.LayerNorm.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.LayerNorm.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.dense.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.output.dense.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.key.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.key.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.query.bias
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.query.weight
2020-06-10 11:20:33,225 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.value.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.attention.self.value.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.intermediate.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.intermediate.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.LayerNorm.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.LayerNorm.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.17.output.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.LayerNorm.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.LayerNorm.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.output.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.key.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.key.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.query.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.query.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.value.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.attention.self.value.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.intermediate.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.intermediate.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.LayerNorm.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.LayerNorm.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.18.output.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.LayerNorm.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.LayerNorm.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.dense.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.output.dense.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.key.bias
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.key.weight
2020-06-10 11:20:33,226 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.query.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.query.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.value.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.attention.self.value.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.intermediate.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.intermediate.dense.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.LayerNorm.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.LayerNorm.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.19.output.dense.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.LayerNorm.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.LayerNorm.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.output.dense.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.key.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.key.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.query.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.query.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.value.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.attention.self.value.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.intermediate.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.intermediate.dense.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.LayerNorm.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.LayerNorm.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.2.output.dense.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.LayerNorm.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.LayerNorm.weight
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.dense.bias
2020-06-10 11:20:33,227 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.output.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.key.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.key.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.query.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.query.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.value.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.attention.self.value.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.intermediate.dense.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.intermediate.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.LayerNorm.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.LayerNorm.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.dense.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.20.output.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.LayerNorm.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.LayerNorm.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.dense.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.output.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.key.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.key.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.query.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.query.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.value.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.attention.self.value.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.intermediate.dense.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.intermediate.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.LayerNorm.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.LayerNorm.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.dense.bias
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.21.output.dense.weight
2020-06-10 11:20:33,228 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.LayerNorm.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.LayerNorm.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.dense.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.output.dense.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.key.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.key.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.query.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.query.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.value.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.attention.self.value.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.intermediate.dense.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.intermediate.dense.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.LayerNorm.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.LayerNorm.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.dense.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.22.output.dense.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.LayerNorm.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.LayerNorm.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.dense.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.output.dense.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.key.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.key.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.query.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.query.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.value.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.attention.self.value.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.intermediate.dense.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.intermediate.dense.weight
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.LayerNorm.bias
2020-06-10 11:20:33,229 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.LayerNorm.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.23.output.dense.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.LayerNorm.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.LayerNorm.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.output.dense.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.key.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.key.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.query.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.query.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.value.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.attention.self.value.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.intermediate.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.intermediate.dense.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.LayerNorm.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.LayerNorm.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.3.output.dense.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.LayerNorm.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.LayerNorm.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.output.dense.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.key.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.key.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.query.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.query.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.value.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.attention.self.value.weight
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.intermediate.dense.bias
2020-06-10 11:20:33,230 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.intermediate.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.LayerNorm.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.LayerNorm.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.dense.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.4.output.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.LayerNorm.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.LayerNorm.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.dense.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.output.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.key.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.key.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.query.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.query.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.value.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.attention.self.value.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.intermediate.dense.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.intermediate.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.LayerNorm.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.LayerNorm.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.dense.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.5.output.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.LayerNorm.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.LayerNorm.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.dense.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.output.dense.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.key.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.key.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.query.bias
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.query.weight
2020-06-10 11:20:33,231 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.value.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.attention.self.value.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.intermediate.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.intermediate.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.LayerNorm.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.LayerNorm.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.6.output.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.LayerNorm.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.LayerNorm.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.output.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.key.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.key.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.query.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.query.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.value.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.attention.self.value.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.intermediate.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.intermediate.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.LayerNorm.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.LayerNorm.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.7.output.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.LayerNorm.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.LayerNorm.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.dense.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.output.dense.weight
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.key.bias
2020-06-10 11:20:33,232 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.key.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.query.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.query.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.value.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.attention.self.value.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.intermediate.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.intermediate.dense.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.LayerNorm.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.LayerNorm.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.8.output.dense.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.LayerNorm.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.LayerNorm.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.output.dense.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.key.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.key.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.query.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.query.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.value.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.attention.self.value.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.intermediate.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.intermediate.dense.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.LayerNorm.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.LayerNorm.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.encoder.layer.9.output.dense.weight
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.pooler.dense.bias
2020-06-10 11:20:33,233 - INFO - allennlp.nn.initializers - _text_field_embedder.token_embedder_tokens.transformer_model.pooler.dense.weight
2020-06-10 11:20:34,211 - INFO - allennlp.common.params - dataset_reader.type = snli
2020-06-10 11:20:34,211 - INFO - allennlp.common.params - dataset_reader.lazy = False
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.cache_directory = None
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.max_instances = None
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.type = pretrained_transformer
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.model_name = roberta-large
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.add_special_tokens = False
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.max_length = None
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.stride = 0
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.truncation_strategy = longest_first
2020-06-10 11:20:34,212 - INFO - allennlp.common.params - dataset_reader.tokenizer.tokenizer_kwargs = None
2020-06-10 11:20:34,529 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:34,530 - INFO - transformers.configuration_utils - Model config RobertaConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "roberta",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"pad_token_id": 1,
"type_vocab_size": 1,
"vocab_size": 50265
}
2020-06-10 11:20:35,178 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:35,178 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:35,589 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:35,590 - INFO - transformers.configuration_utils - Model config RobertaConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "roberta",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"pad_token_id": 1,
"type_vocab_size": 1,
"vocab_size": 50265
}
2020-06-10 11:20:36,218 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:36,219 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:36,337 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.type = pretrained_transformer
2020-06-10 11:20:36,338 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2020-06-10 11:20:36,338 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.model_name = roberta-large
2020-06-10 11:20:36,338 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.namespace = tags
2020-06-10 11:20:36,338 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.max_length = 512
2020-06-10 11:20:36,653 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:36,654 - INFO - transformers.configuration_utils - Model config RobertaConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "roberta",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"pad_token_id": 1,
"type_vocab_size": 1,
"vocab_size": 50265
}
2020-06-10 11:20:37,398 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:37,399 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:37,833 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/michaels/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
2020-06-10 11:20:37,834 - INFO - transformers.configuration_utils - Model config RobertaConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "roberta",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"pad_token_id": 1,
"type_vocab_size": 1,
"vocab_size": 50265
}
2020-06-10 11:20:38,678 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/michaels/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
2020-06-10 11:20:38,679 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/michaels/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
2020-06-10 11:20:38,806 - INFO - allennlp.common.params - dataset_reader.combine_input_fields = None
input 0: {"hypothesis": "Two women are sitting on a blanket near some rocks talking about politics.", "premise": "Two women are wandering along the shore drinking iced tea."}
prediction: {"logits": [-3.2302780151367188, 6.4810333251953125, -2.5054984092712402], "probs": [6.058295912225731e-05, 0.9998144507408142, 0.00012505988706834614], "token_ids": [0, 1596, 390, 32, 26884, 552, 5, 8373, 4835, 1437, 12646, 6845, 4, 2, 2, 1596, 390, 32, 2828, 15, 10, 14165, 583, 103, 10889, 1686, 59, 2302, 4, 2], "label": "contradiction", "tokens": ["<s>", "\u0120Two", "\u0120women", "\u0120are", "\u0120wandering", "\u0120along", "\u0120the", "\u0120shore", "\u0120drinking", "\u0120", "iced", "\u0120tea", ".", "</s>", "</s>", "\u0120Two", "\u0120women", "\u0120are", "\u0120sitting", "\u0120on", "\u0120a", "\u0120blanket", "\u0120near", "\u0120some", "\u0120rocks", "\u0120talking", "\u0120about", "\u0120politics", ".", "</s>"]}
2020-06-10 11:20:39,234 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpnk3qeff5
@dirkgr do we want the /u
s in this output? E.g. \u0120Two
I don't know why the tokens are in the output at all. Maybe it's for interpret
to work, or a visualization tool? If we have tokens in the output at all, they should have the \u0120
. The tokens and token ids are a view into the internals of the model, and so is \u0120
.
Fixed in demo.
I don't know why the tokens are in the output at all. Maybe it's for
interpret
to work, or a visualization tool?
Yes, it's so that interpretations can actually be understood correctly. We need to know the tokenization, which we can only get from the model.