allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.77k stars 2.25k forks source link

RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191 #2928

Closed dmtrkl closed 5 years ago

dmtrkl commented 5 years ago

System:

Question How to overcome the following issue?

I'm trying trying to add POS/NER and dependency label embeddings like it is used in this configuration: tests/fixtures/encoder_decoder/simple_seq2seq/experiment.json but in order to train a model for Even2Mind dataset.

                                                                                                                   File: config_tags.json                                                                                                                                                                                                                                                   
{
    "dataset_reader": {
        "type": "event2mind",
        "source_tokenizer": {
            "type": "word",
            "word_splitter": {
                "type": "spacy",
                "pos_tags": true,
                "parse": true,
                "ner": true
            }
    },
    "target_tokenizer": {
            "type": "word"
        },
    "source_token_indexers": {
            "tokens": {
                "type": "single_id",
                "namespace": "source_tokens"
            },
            "pos_tags": {
                "type": "pos_tag",
                "namespace": "pos"
            },
            "dependency_label": {
                "type": "dependency_label"
            },
            "ner_tags": {
                "type": "ner_tag",
                "namespace": "ner"
            }
    }
    },
    "train_data_path": "https://raw.githubusercontent.com/uwnlp/event2mind/master/docs/data/train.csv",
    "validation_data_path": "https://raw.githubusercontent.com/uwnlp/event2mind/master/docs/data/dev.csv",
    "test_data_path": "https://raw.githubusercontent.com/uwnlp/event2mind/master/docs/data/test.csv",
    "model": {
    "type": "event2mind",
        "source_embedder": {
            "token_embedders": {
                "tokens": {
                    "type": "embedding",
                    "embedding_dim": 300,
                    "trainable": false,
                    "vocab_namespace": "source_tokens",
                    "pretrained_file": "https://s3-us-west-2.amazonaws.com/allennlp/datasets/glove/glove.6B.300d.txt.gz"
                },
                "pos_tags": {
                    "type": "embedding",
                    "embedding_dim": 20,
                    "vocab_namespace": "pos"
                },
                "ner_tags": {
                    "type": "embedding",
                    "embedding_dim": 20,
                    "vocab_namespace": "ner"
                },
                "dependency_label": {
                    "type": "embedding",
                    "embedding_dim": 20,
                    "vocab_namespace": "dependencies"
                }
            },
            "allow_unmatched_keys": true
        },
    "embedding_dropout": 0.2,
        "encoder": {
            "type": "gru",
            "input_size": 360,
            "hidden_size": 50,
            "bidirectional": true
        },
    "max_decoding_steps": 10
    },
    "iterator": {
        "type": "bucket",
        "sorting_keys": [
            [
                "source",
                "num_tokens"
            ]
    ],
    "padding_noise": 0,
        "batch_size": 64
    },
    "trainer": {
        "optimizer": {
            "type": "adam"
        },
    "patience": 10,
        "num_epochs": 20,
    }
}

But a I am getting this error: RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191

joelgrus commented 5 years ago

can you share the full stack trace

dmtrkl commented 5 years ago

can you share the full stack trace


Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
2019-06-05 11:22:38,622 - INFO - allennlp.common.params - random_seed = 13370
2019-06-05 11:22:38,622 - INFO - allennlp.common.params - numpy_seed = 1337
2019-06-05 11:22:38,622 - INFO - allennlp.common.params - pytorch_seed = 133
2019-06-05 11:22:38,623 - INFO - allennlp.common.checks - Pytorch version: 1.0.1.post2
2019-06-05 11:22:38,624 - INFO - allennlp.common.params - evaluate_on_test = False
2019-06-05 11:22:38,625 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> from params {'source_token_indexers': {'dependency_label': {'type': 'dependency_label'}, 'ner_tags': {'namespace': 'ner', 'type': 'ner_tag'}, 'pos_tags': {'namespace': 'pos', 'type': 'pos_tag'}, 'tokens': {'namespace': 'source_tokens', 'type': 'single_id'}}, 'source_tokenizer': {'type': 'word', 'word_splitter': {'ner': True, 'parse': True, 'pos_tags': True, 'type': 'spacy'}}, 'target_token_indexers': {'tokens': {'namespace': 'target_tokens'}}, 'target_tokenizer': {'type': 'word'}, 'type': 'event2mind'} and extras set()
2019-06-05 11:22:38,625 - INFO - allennlp.common.params - dataset_reader.type = event2mind
2019-06-05 11:22:38,625 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.dataset_readers.event2mind.Event2MindDatasetReader'> from params {'source_token_indexers': {'dependency_label': {'type': 'dependency_label'}, 'ner_tags': {'namespace': 'ner', 'type': 'ner_tag'}, 'pos_tags': {'namespace': 'pos', 'type': 'pos_tag'}, 'tokens': {'namespace': 'source_tokens', 'type': 'single_id'}}, 'source_tokenizer': {'type': 'word', 'word_splitter': {'ner': True, 'parse': True, 'pos_tags': True, 'type': 'spacy'}}, 'target_token_indexers': {'tokens': {'namespace': 'target_tokens'}}, 'target_tokenizer': {'type': 'word'}} and extras set()
2019-06-05 11:22:38,626 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.tokenizers.tokenizer.Tokenizer'> from params {'type': 'word', 'word_splitter': {'ner': True, 'parse': 
```True, 'pos_tags': True, 'type': 'spacy'}} and extras set()
2019-06-05 11:22:38,626 - INFO - allennlp.common.params - dataset_reader.source_tokenizer.type = word
2019-06-05 11:22:38,626 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.tokenizers.word_tokenizer.WordTokenizer'> from params {'word_splitter': {'ner': True, 'parse': True, 'pos_tags': True, 'type': 'spacy'}} and extras set()
2019-06-05 11:22:38,626 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.tokenizers.word_splitter.WordSplitter'> from params {'ner': True, 'parse': True, 'pos_tags': True, 'type': 'spacy'} and extras set()
2019-06-05 11:22:38,626 - INFO - allennlp.common.params - dataset_reader.source_tokenizer.word_splitter.type = spacy
2019-06-05 11:22:38,626 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.tokenizers.word_splitter.SpacyWordSplitter'> from params {'ner': True, 'parse': True, 'pos_tags': True} and extras set()
2019-06-05 11:22:38,627 - INFO - allennlp.common.params - dataset_reader.source_tokenizer.word_splitter.language = en_core_web_sm
2019-06-05 11:22:38,627 - INFO - allennlp.common.params - dataset_reader.source_tokenizer.word_splitter.pos_tags = True
2019-06-05 11:22:38,627 - INFO - allennlp.common.params - dataset_reader.source_tokenizer.word_splitter.parse = True
2019-06-05 11:22:38,627 - INFO - allennlp.common.params - dataset_reader.source_tokenizer.word_splitter.ner = True
2019-06-05 11:22:38,627 - INFO - allennlp.common.params - dataset_reader.source_tokenizer.word_splitter.keep_spacy_tokens = False
2019-06-05 11:22:38,937 - INFO - allennlp.common.params - dataset_reader.source_tokenizer.start_tokens = None
2019-06-05 11:22:38,937 - INFO - allennlp.common.params - dataset_reader.source_tokenizer.end_tokens = None
2019-06-05 11:22:38,937 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.tokenizers.tokenizer.Tokenizer'> from params {'type': 'word'} and extras set()
2019-06-05 11:22:38,938 - INFO - allennlp.common.params - dataset_reader.target_tokenizer.type = word
2019-06-05 11:22:38,938 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.tokenizers.word_tokenizer.WordTokenizer'> from params {} and extras set()
2019-06-05 11:22:38,938 - INFO - allennlp.common.params - dataset_reader.target_tokenizer.start_tokens = None
2019-06-05 11:22:38,938 - INFO - allennlp.common.params - dataset_reader.target_tokenizer.end_tokens = None
2019-06-05 11:22:39,046 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.token_indexer.TokenIndexer from params {'type': 'dependency_label'} and extras set()
2019-06-05 11:22:39,046 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.dependency_label.type = dependency_label
2019-06-05 11:22:39,047 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.dep_label_indexer.DepLabelIndexer from params {} and extras set()
2019-06-05 11:22:39,047 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.dependency_label.namespace = dep_labels
2019-06-05 11:22:39,047 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.token_indexer.TokenIndexer from params {'namespace': 'ner', 'type': 'ner_tag'} and extras set()
2019-06-05 11:22:39,047 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.ner_tags.type = ner_tag
2019-06-05 11:22:39,048 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.ner_tag_indexer.NerTagIndexer from params {'namespace': 'ner'} and extras set()
2019-06-05 11:22:39,048 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.ner_tags.namespace = ner
2019-06-05 11:22:39,048 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.token_indexer.TokenIndexer from params {'namespace': 'pos', 'type': 'pos_tag'} and extras set()
2019-06-05 11:22:39,048 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.pos_tags.type = pos_tag
2019-06-05 11:22:39,049 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.pos_tag_indexer.PosTagIndexer from params {'namespace': 'pos'} and extras set()
2019-06-05 11:22:39,049 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.pos_tags.namespace = pos
2019-06-05 11:22:39,049 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.pos_tags.coarse_tags = False
2019-06-05 11:22:39,049 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.token_indexer.TokenIndexer from params {'namespace': 'source_tokens', 'type': 'single_id'} and extras set()
2019-06-05 11:22:39,050 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.tokens.type = single_id
2019-06-05 11:22:39,050 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.single_id_token_indexer.SingleIdTokenIndexer from params {'namespace': 'source_tokens'} and extras set()
2019-06-05 11:22:39,050 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.tokens.namespace = source_tokens
2019-06-05 11:22:39,050 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.tokens.lowercase_tokens = False
2019-06-05 11:22:39,050 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.tokens.start_tokens = None
2019-06-05 11:22:39,051 - INFO - allennlp.common.params - dataset_reader.source_token_indexers.tokens.end_tokens = None
2019-06-05 11:22:39,051 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.token_indexer.TokenIndexer from params {'namespace': 'target_tokens'} and extras set()
2019-06-05 11:22:39,051 - INFO - allennlp.common.params - dataset_reader.target_token_indexers.tokens.type = single_id
2019-06-05 11:22:39,051 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.single_id_token_indexer.SingleIdTokenIndexer from params {'namespace': 'target_tokens'} and extras set()
2019-06-05 11:22:39,051 - INFO - allennlp.common.params - dataset_reader.target_token_indexers.tokens.namespace = target_tokens
2019-06-05 11:22:39,051 - INFO - allennlp.common.params - dataset_reader.target_token_indexers.tokens.lowercase_tokens = False
2019-06-05 11:22:39,051 - INFO - allennlp.common.params - dataset_reader.target_token_indexers.tokens.start_tokens = None
2019-06-05 11:22:39,051 - INFO - allennlp.common.params - dataset_reader.target_token_indexers.tokens.end_tokens = None
2019-06-05 11:22:39,051 - INFO - allennlp.common.params - dataset_reader.source_add_start_token = True
2019-06-05 11:22:39,052 - INFO - allennlp.common.params - dataset_reader.dummy_instances_for_vocab_generation = False
2019-06-05 11:22:39,052 - INFO - allennlp.common.params - dataset_reader.lazy = False
2019-06-05 11:22:39,052 - INFO - allennlp.common.params - validation_dataset_reader = None
2019-06-05 11:22:39,052 - INFO - allennlp.common.params - train_data_path = ./data/train.csv
2019-06-05 11:22:39,052 - INFO - allennlp.training.util - Reading training data from ./data/train.csv
0it [00:00, ?it/s]2019-06-05 11:22:39,052 - INFO - allennlp.data.dataset_readers.event2mind - Reading instances from lines in file at: ./data/train.csv
116184it [49:47, 38.89it/s]
2019-06-05 12:12:26,489 - INFO - allennlp.common.params - validation_data_path = ./data/dev.csv
2019-06-05 12:12:26,489 - INFO - allennlp.training.util - Reading validation data from ./data/dev.csv
0it [00:00, ?it/s]2019-06-05 12:12:26,489 - INFO - allennlp.data.dataset_readers.event2mind - Reading instances from lines in file at: ./data/dev.csv
13279it [02:37, 84.21it/s]
2019-06-05 12:15:04,177 - INFO - allennlp.common.params - test_data_path = ./data/test.csv
2019-06-05 12:15:04,177 - INFO - allennlp.training.util - Reading test data from ./data/test.csv
0it [00:00, ?it/s]2019-06-05 12:15:04,177 - INFO - allennlp.data.dataset_readers.event2mind - Reading instances from lines in file at: ./data/test.csv
12833it [02:39, 80.36it/s]
2019-06-05 12:17:43,869 - INFO - allennlp.training.trainer - From dataset instances, validation, train, test will be considered for vocabulary creation.
2019-06-05 12:17:43,869 - INFO - allennlp.common.params - vocabulary.type = None
2019-06-05 12:17:43,869 - INFO - allennlp.common.params - vocabulary.extend = False
2019-06-05 12:17:43,869 - INFO - allennlp.common.params - vocabulary.directory_path = None
2019-06-05 12:17:43,869 - INFO - allennlp.common.params - vocabulary.max_vocab_size = None
2019-06-05 12:17:43,869 - INFO - allennlp.common.params - vocabulary.non_padded_namespaces = ('*tags', '*labels')
2019-06-05 12:17:43,869 - INFO - allennlp.common.params - vocabulary.min_pretrained_embeddings = None
2019-06-05 12:17:43,869 - INFO - allennlp.common.params - vocabulary.only_include_pretrained_words = False
2019-06-05 12:17:43,869 - INFO - allennlp.common.params - vocabulary.tokens_to_add = None
2019-06-05 12:17:43,869 - INFO - allennlp.data.vocabulary - Fitting token dictionary from dataset.
0it [00:00, ?it/s]2019-06-05 12:17:43,869 - WARNING - allennlp.data.token_indexers.dep_label_indexer - Token had no dependency label: @start@
2019-06-05 12:17:43,869 - WARNING - allennlp.data.token_indexers.dep_label_indexer - Token had no dependency label: @end@
2019-06-05 12:17:43,869 - WARNING - allennlp.data.token_indexers.pos_tag_indexer - Token had no POS tag: @start@
2019-06-05 12:17:43,869 - WARNING - allennlp.data.token_indexers.pos_tag_indexer - Token had no POS tag: @end@
142296it [00:02, 53413.53it/s]
2019-06-05 12:17:46,570 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.models.model.Model'> from params {'embedding_dropout': 0.2, 'encoder': {'bidirectional': True, 'hidden_size': 50, 'input_size': 360, 'type': 'gru'}, 'max_decoding_steps': 10, 'source_embedder': {'allow_unmatched_keys': True, 'token_embedders': {'dependency_label': {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'dependencies'}, 'ner_tags': {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'ner'}, 'pos_tags': {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'pos'}, 'tokens': {'embedding_dim': 300, 'pretrained_file': './data/glove.6B.300d.txt', 'trainable': False, 'type': 'embedding', 'vocab_namespace': 'source_tokens'}}}, 'target_namespace': 'target_tokens', 'type': 'event2mind'} and extras {'vocab'}
2019-06-05 12:17:46,570 - INFO - allennlp.common.params - model.type = event2mind
2019-06-05 12:17:46,570 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.models.event2mind.Event2Mind'> from params {'embedding_dropout': 0.2, 'encoder': {'bidirectional': True, 'hidden_size': 50, 'input_size': 360, 'type': 'gru'}, 'max_decoding_steps': 10, 'source_embedder': {'allow_unmatched_keys': True, 'token_embedders': {'dependency_label': {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'dependencies'}, 'ner_tags': {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'ner'}, 'pos_tags': {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'pos'}, 'tokens': {'embedding_dim': 300, 'pretrained_file': './data/glove.6B.300d.txt', 'trainable': False, 'type': 'embedding', 'vocab_namespace': 'source_tokens'}}}, 'target_namespace': 'target_tokens'} and extras {'vocab'}
2019-06-05 12:17:46,570 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder'> from params {'allow_unmatched_keys': True, 'token_embedders': {'dependency_label': {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'dependencies'}, 'ner_tags': {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'ner'}, 'pos_tags': {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'pos'}, 'tokens': {'embedding_dim': 300, 'pretrained_file': './data/glove.6B.300d.txt', 'trainable': False, 'type': 'embedding', 'vocab_namespace': 'source_tokens'}}} and extras {'vocab'}
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.type = basic
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.embedder_to_indexer_map = None
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.allow_unmatched_keys = True
2019-06-05 12:17:46,571 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.token_embedders.token_embedder.TokenEmbedder'> from params {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'dependencies'} and extras {'vocab'}
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.type = embedding
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.num_embeddings = None
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.vocab_namespace = dependencies
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.embedding_dim = 20
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.pretrained_file = None
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.projection_dim = None
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.trainable = True
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.padding_index = None
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.max_norm = None
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.norm_type = 2.0
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.scale_grad_by_freq = False
2019-06-05 12:17:46,571 - INFO - allennlp.common.params - model.source_embedder.token_embedders.dependency_label.sparse = False
2019-06-05 12:17:46,572 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.token_embedders.token_embedder.TokenEmbedder'> from params {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'ner'} and extras {'vocab'}
2019-06-05 12:17:46,572 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.type = embedding
2019-06-05 12:17:46,572 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.num_embeddings = None
2019-06-05 12:17:46,572 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.vocab_namespace = ner
2019-06-05 12:17:46,572 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.embedding_dim = 20
2019-06-05 12:17:46,572 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.pretrained_file = None
2019-06-05 12:17:46,572 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.projection_dim = None
2019-06-05 12:17:46,572 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.trainable = True
2019-06-05 12:17:46,572 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.padding_index = None
2019-06-05 12:17:46,572 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.max_norm = None
2019-06-05 12:17:46,572 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.norm_type = 2.0
2019-06-05 12:17:46,572 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.scale_grad_by_freq = False
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.ner_tags.sparse = False
2019-06-05 12:17:46,573 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.token_embedders.token_embedder.TokenEmbedder'> from params {'embedding_dim': 20, 'type': 'embedding', 'vocab_namespace': 'pos'} and extras {'vocab'}
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.type = embedding
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.num_embeddings = None
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.vocab_namespace = pos
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.embedding_dim = 20
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.pretrained_file = None
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.projection_dim = None
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.trainable = True
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.padding_index = None
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.max_norm = None
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.norm_type = 2.0
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.scale_grad_by_freq = False
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.pos_tags.sparse = False
2019-06-05 12:17:46,573 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.token_embedders.token_embedder.TokenEmbedder'> from params {'embedding_dim': 300, 'pretrained_file': './data/glove.6B.300d.txt', 'trainable': False, 'type': 'embedding', 'vocab_namespace': 'source_tokens'} and extras {'vocab'}
2019-06-05 12:17:46,573 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.type = embedding
2019-06-05 12:17:46,574 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.num_embeddings = None
2019-06-05 12:17:46,574 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.vocab_namespace = source_tokens
2019-06-05 12:17:46,574 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.embedding_dim = 300
2019-06-05 12:17:46,574 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.pretrained_file = ./data/glove.6B.300d.txt
2019-06-05 12:17:46,574 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.projection_dim = None
2019-06-05 12:17:46,574 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.trainable = False
2019-06-05 12:17:46,574 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.padding_index = None
2019-06-05 12:17:46,574 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.max_norm = None
2019-06-05 12:17:46,574 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.norm_type = 2.0
2019-06-05 12:17:46,574 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.scale_grad_by_freq = False
2019-06-05 12:17:46,574 - INFO - allennlp.common.params - model.source_embedder.token_embedders.tokens.sparse = False
2019-06-05 12:17:46,575 - INFO - allennlp.modules.token_embedders.embedding - Reading pretrained embeddings from file
400000it [00:01, 273421.39it/s]
2019-06-05 12:17:48,050 - INFO - allennlp.modules.token_embedders.embedding - Initializing pre-trained embedding layer
2019-06-05 12:17:48,098 - INFO - allennlp.modules.token_embedders.embedding - Pretrained embeddings were found for 5542 out of 5595 tokens
2019-06-05 12:17:48,100 - INFO - allennlp.common.params - model.embedding_dropout = 0.2
2019-06-05 12:17:48,100 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.seq2vec_encoders.seq2vec_encoder.Seq2VecEncoder'> from params {'bidirectional': True, 'hidden_size': 50, 'input_size': 360, 'type': 'gru'} and extras {'vocab'}
2019-06-05 12:17:48,100 - INFO - allennlp.common.params - model.encoder.type = gru
2019-06-05 12:17:48,100 - INFO - allennlp.common.params - model.encoder.batch_first = True
2019-06-05 12:17:48,100 - INFO - allennlp.common.params - Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
2019-06-05 12:17:48,101 - INFO - allennlp.common.params - CURRENTLY DEFINED PARAMETERS: 
2019-06-05 12:17:48,101 - INFO - allennlp.common.params - model.encoder.bidirectional = True
2019-06-05 12:17:48,101 - INFO - allennlp.common.params - model.encoder.hidden_size = 50
2019-06-05 12:17:48,101 - INFO - allennlp.common.params - model.encoder.input_size = 360
2019-06-05 12:17:48,101 - INFO - allennlp.common.params - model.encoder.batch_first = True
2019-06-05 12:17:48,102 - INFO - allennlp.common.params - model.max_decoding_steps = 10
2019-06-05 12:17:48,102 - INFO - allennlp.common.params - model.beam_size = 10
2019-06-05 12:17:48,102 - INFO - allennlp.common.params - model.target_names = None
2019-06-05 12:17:48,102 - INFO - allennlp.common.params - model.target_namespace = target_tokens
2019-06-05 12:17:48,102 - INFO - allennlp.common.params - model.target_embedding_dim = None
2019-06-05 12:17:48,211 - INFO - root - Loading a model trained before embedding extension was implemented; pass an explicit vocab namespace if you want to extend the vocabulary.
2019-06-05 12:17:48,211 - INFO - root - Loading a model trained before embedding extension was implemented; pass an explicit vocab namespace if you want to extend the vocabulary.
2019-06-05 12:17:48,211 - INFO - root - Loading a model trained before embedding extension was implemented; pass an explicit vocab namespace if you want to extend the vocabulary.
2019-06-05 12:17:48,247 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.iterators.data_iterator.DataIterator'> from params {'batch_size': 64, 'padding_noise': 0, 'sorting_keys': [['source', 'num_tokens']], 'type': 'bucket'} and extras set()
2019-06-05 12:17:48,247 - INFO - allennlp.common.params - iterator.type = bucket
2019-06-05 12:17:48,247 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.iterators.bucket_iterator.BucketIterator'> from params {'batch_size': 64, 'padding_noise': 0, 'sorting_keys': [['source', 'num_tokens']]} and extras set()
2019-06-05 12:17:48,247 - INFO - allennlp.common.params - iterator.sorting_keys = [['source', 'num_tokens']]
2019-06-05 12:17:48,247 - INFO - allennlp.common.params - iterator.padding_noise = 0
2019-06-05 12:17:48,247 - INFO - allennlp.common.params - iterator.biggest_batch_first = False
2019-06-05 12:17:48,247 - INFO - allennlp.common.params - iterator.batch_size = 64
2019-06-05 12:17:48,247 - INFO - allennlp.common.params - iterator.instances_per_epoch = None
2019-06-05 12:17:48,247 - INFO - allennlp.common.params - iterator.max_instances_in_memory = None
2019-06-05 12:17:48,247 - INFO - allennlp.common.params - iterator.cache_instances = False
2019-06-05 12:17:48,247 - INFO - allennlp.common.params - iterator.track_epoch = False
2019-06-05 12:17:48,247 - INFO - allennlp.common.params - iterator.maximum_samples_per_batch = None
2019-06-05 12:17:48,247 - INFO - allennlp.common.params - validation_iterator = None
2019-06-05 12:17:48,248 - INFO - allennlp.common.params - trainer.no_grad = ()
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - Following parameters are Frozen  (without gradient):
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _source_embedder.token_embedder_tokens.weight
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - Following parameters are Tunable (with gradient):
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _source_embedder.token_embedder_dependency_label.weight
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _source_embedder.token_embedder_ner_tags.weight
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _source_embedder.token_embedder_pos_tags.weight
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _encoder._module.weight_ih_l0
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _encoder._module.weight_hh_l0
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _encoder._module.bias_ih_l0
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _encoder._module.bias_hh_l0
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _encoder._module.weight_ih_l0_reverse
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _encoder._module.weight_hh_l0_reverse
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _encoder._module.bias_ih_l0_reverse
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _encoder._module.bias_hh_l0_reverse
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _states.xintent.embedder.weight
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _states.xintent.decoder_cell.weight_ih
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _states.xintent.decoder_cell.weight_hh
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _states.xintent.decoder_cell.bias_ih
2019-06-05 12:17:48,248 - INFO - allennlp.training.trainer - _states.xintent.decoder_cell.bias_hh
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.xintent.output_projection_layer.weight
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.xintent.output_projection_layer.bias
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.xreact.embedder.weight
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.xreact.decoder_cell.weight_ih
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.xreact.decoder_cell.weight_hh
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.xreact.decoder_cell.bias_ih
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.xreact.decoder_cell.bias_hh
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.xreact.output_projection_layer.weight
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.xreact.output_projection_layer.bias
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.oreact.embedder.weight
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.oreact.decoder_cell.weight_ih
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.oreact.decoder_cell.weight_hh
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.oreact.decoder_cell.bias_ih
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.oreact.decoder_cell.bias_hh
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.oreact.output_projection_layer.weight
2019-06-05 12:17:48,249 - INFO - allennlp.training.trainer - _states.oreact.output_projection_layer.bias
2019-06-05 12:17:48,249 - INFO - allennlp.common.params - trainer.patience = 10
2019-06-05 12:17:48,249 - INFO - allennlp.common.params - trainer.validation_metric = -loss
2019-06-05 12:17:48,249 - INFO - allennlp.common.params - trainer.shuffle = True
2019-06-05 12:17:48,249 - INFO - allennlp.common.params - trainer.num_epochs = 40
2019-06-05 12:17:48,249 - INFO - allennlp.common.params - trainer.cuda_device = -1
2019-06-05 12:17:48,249 - INFO - allennlp.common.params - trainer.grad_norm = None
2019-06-05 12:17:48,249 - INFO - allennlp.common.params - trainer.grad_clipping = None
2019-06-05 12:17:48,250 - INFO - allennlp.common.params - trainer.learning_rate_scheduler = None
2019-06-05 12:17:48,250 - INFO - allennlp.common.params - trainer.momentum_scheduler = None
2019-06-05 12:17:48,250 - INFO - allennlp.common.params - trainer.optimizer.type = adam
2019-06-05 12:17:48,250 - INFO - allennlp.common.params - trainer.optimizer.parameter_groups = None
2019-06-05 12:17:48,250 - INFO - allennlp.training.optimizers - Number of trainable parameters: 15636005
2019-06-05 12:17:48,250 - INFO - allennlp.common.params - trainer.optimizer.infer_type_and_cast = True
2019-06-05 12:17:48,251 - INFO - allennlp.common.params - Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
2019-06-05 12:17:48,251 - INFO - allennlp.common.params - CURRENTLY DEFINED PARAMETERS: 
2019-06-05 12:17:48,251 - INFO - allennlp.common.params - trainer.num_serialized_models_to_keep = 20
2019-06-05 12:17:48,251 - INFO - allennlp.common.params - trainer.keep_serialized_model_every_num_seconds = None
2019-06-05 12:17:48,251 - INFO - allennlp.common.params - trainer.model_save_interval = None
2019-06-05 12:17:48,251 - INFO - allennlp.common.params - trainer.summary_interval = 100
2019-06-05 12:17:48,251 - INFO - allennlp.common.params - trainer.histogram_interval = None
2019-06-05 12:17:48,251 - INFO - allennlp.common.params - trainer.should_log_parameter_statistics = True
2019-06-05 12:17:48,251 - INFO - allennlp.common.params - trainer.should_log_learning_rate = False
2019-06-05 12:17:48,251 - INFO - allennlp.common.params - trainer.log_batch_size_period = None
2019-06-05 12:17:48,275 - INFO - allennlp.training.trainer - Beginning training.
2019-06-05 12:17:48,275 - INFO - allennlp.training.trainer - Epoch 0/39
2019-06-05 12:17:48,275 - INFO - allennlp.training.trainer - Peak CPU memory usage MB: 2081.108
2019-06-05 12:17:48,325 - INFO - allennlp.training.trainer - Training
  0%|          | 0/1816 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/hydrofire/miniconda3/envs/allennlp/bin/allennlp", line 10, in <module>
    sys.exit(run())
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/run.py", line 18, in run
    main(prog="allennlp")
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 101, in main
    args.func(args)
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/commands/train.py", line 103, in train_model_from_args
    args.force)
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/commands/train.py", line 136, in train_model_from_file
    return train_model(params, serialization_dir, file_friendly_logging, recover, force)
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/commands/train.py", line 204, in train_model
    metrics = trainer.train()
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/training/trainer.py", line 480, in train
    train_metrics = self._train_epoch(epoch)
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/training/trainer.py", line 322, in _train_epoch
    loss = self.batch_loss(batch_group, for_training=True)
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/training/trainer.py", line 263, in batch_loss
    output_dict = self.model(**batch)
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/models/event2mind.py", line 160, in forward
    embedded_input = self._embedding_dropout(self._source_embedder(source))
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 123, in forward
    token_vectors = embedder(*tensors)
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/modules/token_embedders/embedding.py", line 139, in forward
    sparse=self.sparse)
  File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191
joelgrus commented 5 years ago

the error is in the call to embedding, so most likely it's getting a token_id that's somehow out of range. (it's also possible the mask is out of range)

can you print the input tensors for the failing example + get the text of that example? that would help to isolate what's going wrong.

dmtrkl commented 5 years ago

How should I do this?

joelgrus commented 5 years ago

it seems like it's failing on the first batch, so I would probably just hack the file

File "/home/hydrofire/miniconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 123, in forward token_vectors = embedder(*tensors)

and add a print(tensors) before that line

presumably then you can see which one has an invalid value in it

dmtrkl commented 5 years ago

key: dependency_label

[tensor([[ 0, 2, 2, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 4, 1, 10, 0], [ 0, 4, 1, 10, 0], [ 0, 4, 1, 10, 0], [ 0, 2, 1, 15, 0], [ 0, 2, 1, 15, 0], [ 0, 2, 1, 15, 0], [ 0, 2, 1, 15, 0], [ 0, 2, 1, 15, 0], [ 0, 2, 1, 15, 0], [ 0, 2, 1, 15, 0], [ 0, 2, 1, 15, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 4, 1, 28, 0], [ 0, 4, 1, 28, 0], [ 0, 4, 1, 28, 0], [ 0, 4, 1, 28, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 15, 0], [ 0, 4, 1, 31, 0], [ 0, 4, 1, 31, 0], [ 0, 4, 1, 31, 0], [ 0, 4, 1, 31, 0], [ 0, 4, 1, 31, 0], [ 0, 4, 1, 31, 0], [ 0, 27, 24, 1, 0], [ 0, 27, 24, 1, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 10, 1, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 4, 1, 17, 0], [ 0, 4, 1, 17, 0], [ 0, 4, 1, 17, 0], [ 0, 4, 1, 17, 0], [ 0, 4, 1, 17, 0], [ 0, 4, 1, 17, 0], [ 0, 4, 1, 17, 0], [ 0, 4, 1, 17, 0], [ 0, 4, 1, 17, 0], [ 0, 4, 1, 17, 0], [ 0, 4, 1, 17, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 3, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0], [ 0, 2, 1, 10, 0]])]

joelgrus commented 5 years ago

I think the issue is probably that in your dependency label token indexer you don't specify a namespace, so it uses the default, which is dep_labels

https://github.com/allenai/allennlp/blob/master/allennlp/data/token_indexers/dep_label_indexer.py#L28

but then in the corresponding embedder you say the namespace is "dependencies". The Embedding module looks at this namespace to find the input dimension, and if this namespace is wrong, it gets the wrong value, and then the ids are too large and crash.

try changing the embedder one to dep_labels or explicitly setting the indexer one to "dependencies"

dmtrkl commented 5 years ago

Yes, that solved my issue. I appreciate your support.