allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.76k stars 2.25k forks source link

Tests run twice as slow in TeamCity as locally #1798

Closed schmmd closed 5 years ago

schmmd commented 6 years ago

We moved to TeamCity in part because tests ran much faster than than on Travis, but they are still running twice as slowly as locally.

CI:

60.16s call allennlp/tests/models/sniff_test.py::SniffTest::test_textual_entailment                 
57.02s call allennlp/tests/models/sniff_test.py::SniffTest::test_semantic_role_labeling             
52.75s call allennlp/tests/models/sniff_test.py::SniffTest::test_ner                                
46.75s call allennlp/tests/models/sniff_test.py::SniffTest::test_constituency_parsing               
16.80s call allennlp/tests/models/semantic_parsing/wikitables/wikitables_erm_semantic_parser_test.py::WikiTablesErmSemanticParserTest::test_model_can_train_save_and_load
16.04s call allennlp/tests/training/trainer_test.py::TestTrainer::test_trainer_respects_keep_serialized_model_every_num_seconds
14.89s call allennlp/tests/models/reading_comprehension/dialog_qa_test.py::DialogQATest::test_model_can_train_save_and_load
11.55s call allennlp/tests/data/dataset_readers/semantic_parsing/atis_test.py::TestAtisReader::test_atis_read_from_file
10.46s call allennlp/tests/models/semantic_parsing/wikitables/wikitables_mml_semantic_parser_test.py::WikiTablesMmlSemanticParserTest::test_elmo_no_features_can_train_save_and_load
9.80s call allennlp/tests/models/sniff_test.py::SniffTest::test_dependency_parsing                  
9.60s call allennlp/tests/models/semantic_parsing/wikitables/wikitables_mml_semantic_parser_test.py::WikiTablesMmlSemanticParserTest::test_model_can_train_save_and_load
9.52s call allennlp/tests/models/semantic_parsing/nlvr/nlvr_coverage_semantic_parser_test.py::NlvrCoverageSemanticParserTest::test_forward_with_epoch_num_changes_cost_weight
8.06s call allennlp/tests/models/semantic_parsing/nlvr/nlvr_coverage_semantic_parser_test.py::NlvrCoverageSemanticParserTest::test_model_can_train_save_and_load
7.82s call allennlp/tests/models/semantic_parsing/wikitables/wikitables_mml_semantic_parser_test.py::WikiTablesMmlSemanticParserTest::test_mixture_no_features_model_can_train_save_and_load
7.04s call allennlp/tests/models/bimpm_test.py::TestBiMPM::test_model_can_train_save_and_load       
6.90s call allennlp/tests/models/crf_tagger_test.py::CrfTaggerTest::test_simple_tagger_can_train_save_and_conll2000
6.00s call allennlp/tests/models/semantic_parsing/nlvr/nlvr_coverage_semantic_parser_test.py::NlvrCoverageSemanticParserTest::test_ungrouped_model_can_train_save_and_load
5.46s call allennlp/tests/notebooks_test.py::TestNotebooks::test_data_pipeline_tutorial             
5.38s call allennlp/tests/models/sniff_test.py::SniffTest::test_coreference_resolution              
4.97s call allennlp/tests/models/semantic_parsing/nlvr/nlvr_coverage_semantic_parser_test.py::NlvrCoverageSemanticParserTest::test_mml_initialized_model_can_train_save_and_load

Locally:

30.39s call allennlp/tests/models/sniff_test.py::SniffTest::test_textual_entailment                 
28.83s call allennlp/tests/models/sniff_test.py::SniffTest::test_semantic_role_labeling             
25.56s call allennlp/tests/models/sniff_test.py::SniffTest::test_ner                                
24.61s call allennlp/tests/models/sniff_test.py::SniffTest::test_constituency_parsing               
15.32s call allennlp/tests/training/trainer_test.py::TestTrainer::test_trainer_respects_keep_serialized_model_every_num_seconds
8.60s call allennlp/tests/models/semantic_parsing/wikitables/wikitables_erm_semantic_parser_test.py::WikiTablesErmSemanticParserTest::test_model_can_train_save_and_load
6.18s call allennlp/tests/models/crf_tagger_test.py::CrfTaggerTest::test_simple_tagger_can_train_save_and_conll2000
6.11s call allennlp/tests/data/dataset_readers/semantic_parsing/atis_test.py::TestAtisReader::test_atis_read_from_file
6.07s call allennlp/tests/models/semantic_parsing/wikitables/wikitables_mml_semantic_parser_test.py::WikiTablesMmlSemanticParserTest::test_elmo_no_features_can_train_save_and_load
4.85s call allennlp/tests/models/semantic_parsing/wikitables/wikitables_mml_semantic_parser_test.py::WikiTablesMmlSemanticParserTest::test_model_can_train_save_and_load
4.59s call allennlp/tests/models/bimpm_test.py::TestBiMPM::test_model_can_train_save_and_load       
4.34s call allennlp/tests/models/crf_tagger_test.py::CrfTaggerTest::test_simple_tagger_can_train_save_and_load_ccgbank
4.15s call allennlp/tests/notebooks_test.py::TestNotebooks::test_data_pipeline_tutorial             
4.09s call allennlp/tests/models/semantic_parsing/nlvr/nlvr_coverage_semantic_parser_test.py::NlvrCoverageSemanticParserTest::test_model_can_train_save_and_load
3.66s call allennlp/tests/notebooks_test.py::TestNotebooks::test_embedding_tokens_tutorial          
3.56s call allennlp/tests/models/semantic_parsing/wikitables/wikitables_mml_semantic_parser_test.py::WikiTablesMmlSemanticParserTest::test_mixture_no_features_model_can_train_save_and_load
3.56s call allennlp/tests/notebooks_test.py::TestNotebooks::test_vocabulary_tutorial                
3.28s call allennlp/tests/models/crf_tagger_test.py::CrfTaggerTest::test_simple_tagger_can_train_save_and_load
2.95s call allennlp/tests/models/semantic_parsing/nlvr/nlvr_coverage_semantic_parser_test.py::NlvrCoverageSemanticParserTest::test_ungrouped_model_can_train_save_and_load
2.85s call allennlp/tests/models/reading_comprehension/dialog_qa_test.py::DialogQATest::test_model_can_train_save_and_load

Initially I was concerned that downloading models slowed down the sniff test significantly, but the above data doesn't show that. We still could mount the cache directory when we run tests, which would avoid duplicate downloads--although it's not clear how much we would gain.

matt-gardner commented 6 years ago

Do the sniff tests actually give us anything? Given that the ELMo sniff tests are our four longest tests by a very big margin, I wonder if we can do something to just get rid of them. I think these tests have two purposes: (1) make sure that the models backing our demo still work, and (2) make sure that current code can still run old models.

I think tests for (1) should be in the demo repository, not here, and we could do (2) with a much faster test that uses a simple test fixture.

joelgrus commented 6 years ago

as someone who is currently mucking with the ELMo code itself, the sniff tests give me a lot! 😀

that doesn't mean they need to run every commit though.

schmmd commented 6 years ago

Not taking a particular position here, but it's easy to exclude them, e.g.:

        exit_code = pytest.main([test_dir, '--color=no', '-k', 'not sniff_test and not notebooks_test',
                                 '-m', 'not java'])  
matt-gardner commented 6 years ago

@joelgrus, that's fair, I'm just wondering if we could get the same value out of tests that run much faster. If we can't, we should definitely keep them.

schmmd commented 6 years ago

Some more timings:

build-docs 70s (locally) 97s (TC) mypy 25s (locally) 37s (TC) pylint 163s (locally) 240s (TC)

So I'm pretty sure some of the time difference is due to the cloud VMs--although the amount of difference is surprising to me. Some of the pylint slowdown could be because it's using 4 threads locally but, if many jobs are running at once, the multithreading will actually slow things down on the shared server. It takes 245s if jobs=1 locally.

matt-gardner commented 5 years ago

Given the last comment here, I'm not sure there's much we can do about this, so I'm going to close it. @schmmd, feel free to re-open if you disagree.