allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.74k stars 2.24k forks source link

Recent commit seems to break Bert token indexing #2858

Closed dwadden closed 5 years ago

dwadden commented 5 years ago

I have a model that uses Bert embeddings. Before commit https://github.com/allenai/allennlp/commit/425f2af310e1000453ba94d4206166cbca1a971b, it worked fine. After pulling the latest AllenNLP code, I get an error with this stack trace:

2019-05-17 10:07:02,263 - INFO - allennlp.training.trainer - Training
  0%|          | 0/687 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/data/dwadden/anaconda3/envs/events/bin/allennlp", line 11, in <module>
    load_entry_point('allennlp', 'console_scripts', 'allennlp')()
  File "/data/dwadden/proj/src/allennlp/allennlp/run.py", line 18, in run
    main(prog="allennlp")
  File "/data/dwadden/proj/src/allennlp/allennlp/commands/__init__.py", line 102, in main
    args.func(args)
  File "/data/dwadden/proj/src/allennlp/allennlp/commands/train.py", line 116, in train_model_from_args
    args.cache_prefix)
  File "/data/dwadden/proj/src/allennlp/allennlp/commands/train.py", line 160, in train_model_from_file
    cache_directory, cache_prefix)
  File "/data/dwadden/proj/src/allennlp/allennlp/commands/train.py", line 243, in train_model
    metrics = trainer.train()
  File "/data/dwadden/proj/src/allennlp/allennlp/training/trainer.py", line 480, in train
    train_metrics = self._train_epoch(epoch)
  File "/data/dwadden/proj/src/allennlp/allennlp/training/trainer.py", line 315, in _train_epoch
    for batch_group in train_generator_tqdm:
  File "/data/dwadden/anaconda3/envs/events/lib/python3.6/site-packages/tqdm/_tqdm.py", line 1022, in __iter__
    for obj in iterable:
  File "/data/dwadden/proj/src/allennlp/allennlp/common/util.py", line 104, in <lambda>
    return iter(lambda: list(islice(iterator, 0, group_size)), [])
  File "/data/dwadden/proj/src/allennlp/allennlp/data/iterators/data_iterator.py", line 143, in __call__
    for batch in batches:
  File "/data/dwadden/proj/src/allennlp/allennlp/data/iterators/bucket_iterator.py", line 117, in _create_batches
    self._padding_noise)
  File "/data/dwadden/proj/src/allennlp/allennlp/data/iterators/bucket_iterator.py", line 30, in sort_by_padding
    padding_lengths = cast(Dict[str, Dict[str, float]], instance.get_padding_lengths())
  File "/data/dwadden/proj/src/allennlp/allennlp/data/instance.py", line 81, in get_padding_lengths
    lengths[field_name] = field.get_padding_lengths()
  File "/data/dwadden/proj/src/allennlp/allennlp/data/fields/text_field.py", line 122, in get_padding_lengths
    indexer.get_token_min_padding_length())
  File "/data/dwadden/proj/src/allennlp/allennlp/data/token_indexers/token_indexer.py", line 82, in get_token_min_padding_length
    return self._token_min_padding_length
AttributeError: 'PretrainedBertIndexer' object has no attribute '_token_min_padding_length'

I'll dig into this more later, but wanted to let you know.

schmmd commented 5 years ago

@dwadden thanks for the heads up and for digging in a bit deeper. Many of us are heads down and focused on the EMNLP deadline but we'll look forward to your follow up.

schmmd commented 5 years ago

@dwadden was the issue caused on 425f2af in particular?

kl2806 commented 5 years ago

@dwadden any luck digging into this more?

DeNeutoy commented 5 years ago

Closing, irreproducible.