test_forward_pass_runs_correctly in bidaf_test.py fails intermittently

Describe the bug The test BidirectionalAttentionFlowTest::test_forward_pass_runs_correctly tests/models/bidaf_test.py fails intermittently with the following error:

>       assert (metrics['f1'] > 0)
E       AssertionError: assert 0.0 > 0

tests/models/bidaf_test.py:40: AssertionError

Full error log:

=============================================================================================================================== FAILURES ================================================================================================================================
____________________________________________________________________________________________________ BidirectionalAttentionFlowTest.test_forward_pass_runs_correctly ____________________________________________________________________________________________________

self = <tests.models.bidaf_test.BidirectionalAttentionFlowTest testMethod=test_forward_pass_runs_correctly>

    def test_forward_pass_runs_correctly(self):
        batch = Batch(self.instances)
        batch.index_instances(self.vocab)
        training_tensors = batch.as_tensor_dict()
        output_dict = self.model(**training_tensors)

        metrics = self.model.get_metrics(reset=True)
        # We've set up the data such that there's a fake answer that consists of the whole
        # paragraph.  _Any_ valid prediction for that question should produce an F1 of greater than
        # zero, while if we somehow haven't been able to load the evaluation data, or there was an
        # error with using the evaluation script, this will fail.  This makes sure that we've
        # loaded the evaluation data correctly and have hooked things up to the official evaluation
        # script.
>       assert metrics["f1"] > 0
E       AssertionError: assert 0.0 > 0

tests/models/bidaf_test.py:40: AssertionError
------------------------------------------------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------------------------------------------------
5it [00:00, 682.13it/s]
100%|██████████| 5/5 [00:00<00:00, 1918.01it/s]
--------------------------------------------------------------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------------------------------------------------------------
16:00:48 - INFO - allennlp.common.checks - Pytorch version: 1.3.1
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> from params {'token_indexers': {'token_characters': {'character_tokenizer': {'byte_encoding': 'utf-8'}, 'min_padding_length': 5, 'type': 'characters'}, 'tokens': {'lowercase_tokens': True, 'type': 'single_id'}}, 'type': 'squad'} and extras set()
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp_rc.dataset_readers.squad.SquadReader'> from params {'token_indexers': {'token_characters': {'character_tokenizer': {'byte_encoding': 'utf-8'}, 'min_padding_length': 5, 'type': 'characters'}, 'tokens': {'lowercase_tokens': True, 'type': 'single_id'}}} and extras set()
16:00:48 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.token_indexer.TokenIndexer from params {'character_tokenizer': {'byte_encoding': 'utf-8'}, 'min_padding_length': 5, 'type': 'characters'} and extras set()
16:00:48 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.token_characters_indexer.TokenCharactersIndexer from params {'character_tokenizer': {'byte_encoding': 'utf-8'}, 'min_padding_length': 5} and extras set()
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.tokenizers.character_tokenizer.CharacterTokenizer'> from params {'byte_encoding': 'utf-8'} and extras set()
16:00:48 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.token_indexer.TokenIndexer from params {'lowercase_tokens': True, 'type': 'single_id'} and extras set()
16:00:48 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.single_id_token_indexer.SingleIdTokenIndexer from params {'lowercase_tokens': True} and extras set()
16:00:48 - INFO - allennlp_rc.dataset_readers.squad - Reading file at allennlp-reading-comprehension/test_fixtures/data/squad.json
16:00:48 - INFO - allennlp_rc.dataset_readers.squad - Reading the dataset
16:00:48 - DEBUG - allennlp_rc.dataset_readers.util - Bad labelling or tokenization - end offset doesn't match
16:00:48 - DEBUG - allennlp_rc.dataset_readers.squad - Passage: In 1882, Albert Zahm (John Zahm's brother) built an early wind tunnel used to compare lift to drag of aeronautical models. Around 1899, Professor Jerome Green became the first American to send a wireless message. In 1931, Father Julius Nieuwland performed early work on basic reactions that was used to create neoprene. Study of nuclear physics at the university began with the building of a nuclear accelerator in 1936, and continues now partly through a partnership in the Joint Institute for Nuclear Astrophysics.
16:00:48 - DEBUG - allennlp_rc.dataset_readers.squad - Passage tokens: [In, 1882, ,, Albert, Zahm, (, John, Zahm, 's, brother, ), built, an, early, wind, tunnel, used, to, compare, lift, to, drag, of, aeronautical, models, ., Around, 1899, ,, Professor, Jerome, Green, became, the, first, American, to, send, a, wireless, message, ., In, 1931, ,, Father, Julius, Nieuwland, performed, early, work, on, basic, reactions, that, was, used, to, create, neoprene, ., Study, of, nuclear, physics, at, the, university, began, with, the, building, of, a, nuclear, accelerator, in, 1936, ,, and, continues, now, partly, through, a, partnership, in, the, Joint, Institute, for, Nuclear, Astrophysics, .]
16:00:48 - DEBUG - allennlp_rc.dataset_readers.squad - Question text: Which individual worked on projects at Notre Dame that eventually created neoprene?
16:00:48 - DEBUG - allennlp_rc.dataset_readers.squad - Answer span: (222, 242)
16:00:48 - DEBUG - allennlp_rc.dataset_readers.squad - Token span: (45, 47)
16:00:48 - DEBUG - allennlp_rc.dataset_readers.squad - Tokens in answer: [Father, Julius, Nieuwland]
16:00:48 - DEBUG - allennlp_rc.dataset_readers.squad - Answer: Father Julius Nieuwl
16:00:48 - INFO - allennlp.data.vocabulary - Fitting token dictionary from dataset.
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.models.model.Model'> from params {'modeling_layer': {'hidden_size': 10, 'input_size': 40, 'num_layers': 1, 'type': 'lstm'}, 'num_highway_layers': 1, 'phrase_layer': {'hidden_size': 10, 'input_size': 10, 'num_layers': 1, 'type': 'lstm'}, 'similarity_function': {'combination': 'x,y,x*y', 'tensor_1_dim': 10, 'tensor_2_dim': 10, 'type': 'linear'}, 'span_end_encoder': {'hidden_size': 10, 'input_size': 70, 'num_layers': 1, 'type': 'lstm'}, 'text_field_embedder': {'token_embedders': {'token_characters': {'embedding': {'embedding_dim': 8, 'num_embeddings': 260}, 'encoder': {'embedding_dim': 8, 'ngram_filter_sizes': [5], 'num_filters': 8, 'type': 'cnn'}, 'type': 'character_encoding'}, 'tokens': {'embedding_dim': 2, 'trainable': False, 'type': 'embedding'}}}, 'type': 'bidaf'} and extras {'vocab'}
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp_rc.models.bidaf.BidirectionalAttentionFlow'> from params {'modeling_layer': {'hidden_size': 10, 'input_size': 40, 'num_layers': 1, 'type': 'lstm'}, 'num_highway_layers': 1, 'phrase_layer': {'hidden_size': 10, 'input_size': 10, 'num_layers': 1, 'type': 'lstm'}, 'similarity_function': {'combination': 'x,y,x*y', 'tensor_1_dim': 10, 'tensor_2_dim': 10, 'type': 'linear'}, 'span_end_encoder': {'hidden_size': 10, 'input_size': 70, 'num_layers': 1, 'type': 'lstm'}, 'text_field_embedder': {'token_embedders': {'token_characters': {'embedding': {'embedding_dim': 8, 'num_embeddings': 260}, 'encoder': {'embedding_dim': 8, 'ngram_filter_sizes': [5], 'num_filters': 8, 'type': 'cnn'}, 'type': 'character_encoding'}, 'tokens': {'embedding_dim': 2, 'trainable': False, 'type': 'embedding'}}}} and extras {'vocab'}
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder'> from params {'token_embedders': {'token_characters': {'embedding': {'embedding_dim': 8, 'num_embeddings': 260}, 'encoder': {'embedding_dim': 8, 'ngram_filter_sizes': [5], 'num_filters': 8, 'type': 'cnn'}, 'type': 'character_encoding'}, 'tokens': {'embedding_dim': 2, 'trainable': False, 'type': 'embedding'}}} and extras {'vocab'}
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.token_embedders.token_embedder.TokenEmbedder'> from params {'embedding': {'embedding_dim': 8, 'num_embeddings': 260}, 'encoder': {'embedding_dim': 8, 'ngram_filter_sizes': [5], 'num_filters': 8, 'type': 'cnn'}, 'type': 'character_encoding'} and extras {'vocab'}
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.seq2vec_encoders.seq2vec_encoder.Seq2VecEncoder'> from params {'embedding_dim': 8, 'ngram_filter_sizes': [5], 'num_filters': 8, 'type': 'cnn'} and extras set()
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.seq2vec_encoders.cnn_encoder.CnnEncoder'> from params {'embedding_dim': 8, 'ngram_filter_sizes': [5], 'num_filters': 8} and extras set()
16:00:48 - DEBUG - allennlp.common.registrable - instantiating registered subclass relu of <class 'allennlp.nn.activations.Activation'>
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.token_embedders.token_embedder.TokenEmbedder'> from params {'embedding_dim': 2, 'trainable': False, 'type': 'embedding'} and extras {'vocab'}
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder'> from params {'hidden_size': 10, 'input_size': 10, 'num_layers': 1, 'type': 'lstm'} and extras {'vocab'}
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.similarity_functions.similarity_function.SimilarityFunction'> from params {'combination': 'x,y,x*y', 'tensor_1_dim': 10, 'tensor_2_dim': 10, 'type': 'linear'} and extras {'vocab'}
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.similarity_functions.linear.LinearSimilarity'> from params {'combination': 'x,y,x*y', 'tensor_1_dim': 10, 'tensor_2_dim': 10} and extras {'vocab'}
16:00:48 - DEBUG - allennlp.common.registrable - instantiating registered subclass linear of <class 'allennlp.nn.activations.Activation'>
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder'> from params {'hidden_size': 10, 'input_size': 40, 'num_layers': 1, 'type': 'lstm'} and extras {'vocab'}
16:00:48 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder'> from params {'hidden_size': 10, 'input_size': 70, 'num_layers': 1, 'type': 'lstm'} and extras {'vocab'}
=========================================================================================================================== 1 failed in 1.89s =======================================================================================================================

To Reproduce The error can be reproduced by using the two seeds below in setup function:

    def setUp(self):        
        torch.manual_seed(1480684806298202300)
        super().setUp()

Expected behavior The test should always pass

System (please complete the following information):

OS: ubuntu 18.04
torch==1.3.1
numpy==1.18.0
pytest==5.3.2
python 3.6.9
allennlp: installed from source; commit 95ef61
allennlp-reading-comprehension: installed from source; commit 7509b01

Additional context The test fails at least 1 out of 15 times I tried, each time using a different seed

allenai / allennlp-reading-comprehension

test_forward_pass_runs_correctly in bidaf_test.py fails intermittently #16