Closed chukarsten closed 3 years ago
@angela97lin I am remembering you did some unit test runtime profiling. Do you have any code to share for that and/or thoughts?
Primary goal: speed up CI runtime. So I'd recommend starting with the longest-running CI job (build_conda_package) and finding ways to speed that up.
@dsherry Yup, I had to do some unit test runtime profiling for the WW PRs. I ended up writing a simple script that parses the XML files generated in our artifacts (example here) and comparing the one on the branch with the one in main. Then I just printed out the difference in time between the two for each test. Kinda messy but here's the script, super messy and specific to what I was doing, but maybe a good start for this:
import csv
import xml.etree.ElementTree as ET
def parse_xml(xmlfile):
tree = ET.parse(xmlfile)
root = tree.getroot()
results = {}
for test_case in root.findall('testcase'):
attributes = test_case.attrib
name = attributes['name']
time = attributes['time']
if float(time) > 1:
results[name] = time
return results
def compare_results(main_results, other_results):
timing_results_dict = {}
for test_case_name in main_results:
try:
time_diff = float(other_results[test_case_name]) - float(main_results[test_case_name])
timing_results_dict[test_case_name] = time_diff
except KeyError:
continue
return timing_results_dict
def main():
main_results = parse_xml('main.xml')
woodwork_results = parse_xml('woodwork_36.xml')
time_diffs = compare_results(main_results, woodwork_results)
sorted_diffs = sorted(time_diffs.items(), key=lambda x:x[1])
for diff in sorted_diffs:
print (diff)
if __name__ == "__main__":
main()
I created a dummy draft PR to identify the top 25 longest running unit tests in both windows and linux. I ran it twice to see if the measurements were stable. Here are the results:
FIRST RUN
========================== slowest 25 test durations ==========================
628
160.21s call evalml/tests/automl_tests/test_automl_dask.py::TestAutoMLSearchDask::test_automl
629
150.91s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_graph_partial_dependence_multiclass
630
125.13s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_more_categories_than_grid_resolution
631
86.23s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_multiclass
632
84.73s call evalml/tests/automl_tests/test_automl.py::test_automl_tuner_exception
633
75.75s call evalml/tests/automl_tests/test_automl.py::test_automl_best_pipeline
634
72.69s call evalml/tests/automl_tests/test_automl_dask.py::TestAutoMLSearchDask::test_automl_max_iterations
635
65.64s call evalml/tests/pipeline_tests/test_pipelines.py::test_targets_data_types_classification_pipelines[float64-binary-np]
636
59.10s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[binary]
637
58.59s call evalml/tests/component_tests/test_stacked_ensemble_classifier.py::test_stacked_fit_predict_classification[binary]
638
53.48s call evalml/tests/automl_tests/test_automl.py::test_max_batches_works[regression-False-20]
639
52.80s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[multiclass]
640
52.51s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[regression]
641
52.30s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[DagReuseFeatures-parameters9]
642
50.31s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[LinearPipelineTwoEncoders-parameters3]
643
45.72s call evalml/tests/component_tests/test_stacked_ensemble_classifier.py::test_stacked_fit_predict_classification[multiclass]
644
44.53s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[DagTwoEncoders-parameters8]
645
44.17s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[LinearPipelineWithTextFeatures-parameters4]
646
43.03s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_results[False]
647
41.73s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[LinearPipelineWithImputer-parameters1]
648
41.24s call evalml/tests/automl_tests/test_automl.py::test_max_batches_works[binary-False-20]
649
41.14s call evalml/tests/component_tests/test_stacked_ensemble_regressor.py::test_stacked_fit_predict_regression
650
39.99s call evalml/tests/component_tests/test_utils.py::test_scikit_learn_wrapper
651
39.98s call evalml/tests/automl_tests/test_automl_search_classification.py::test_automl_multiclass_nonlinear_pipeline_search_more_iterations
652
39.95s call evalml/tests/automl_tests/test_automl.py::test_max_batches_works[regression-True-20]
SECOND RUN
========================== slowest 25 test durations ==========================
628
282.88s call evalml/tests/automl_tests/test_automl_dask.py::TestAutoMLSearchDask::test_automl
629
200.48s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_graph_partial_dependence_multiclass
630
184.32s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_more_categories_than_grid_resolution
631
104.49s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_multiclass
632
92.91s call evalml/tests/automl_tests/test_automl.py::test_automl_best_pipeline
633
92.24s call evalml/tests/automl_tests/test_automl.py::test_automl_tuner_exception
634
87.31s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[multiclass]
635
86.06s call evalml/tests/automl_tests/test_automl_dask.py::TestAutoMLSearchDask::test_automl_max_iterations
636
85.57s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[binary]
637
82.75s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[regression]
638
62.02s call evalml/tests/automl_tests/test_automl.py::test_max_batches_works[regression-False-20]
639
60.87s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[LinearPipelineTwoEncoders-parameters3]
640
60.50s call evalml/tests/component_tests/test_stacked_ensemble_classifier.py::test_stacked_fit_predict_classification[binary]
641
59.69s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[DagReuseFeatures-parameters9]
642
59.35s call evalml/tests/pipeline_tests/test_pipelines.py::test_targets_data_types_classification_pipelines[Int64-binary-pd]
643
55.71s call evalml/tests/model_understanding_tests/test_graphs.py::test_jupyter_graph_check
644
55.21s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_graph_partial_dependence
645
53.26s call evalml/tests/component_tests/test_stacked_ensemble_regressor.py::test_stacked_fit_predict_regression
646
51.78s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_passes_pipeline_params[False]
647
51.50s call evalml/tests/component_tests/test_stacked_ensemble_classifier.py::test_stacked_fit_predict_classification[multiclass]
648
50.91s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_passes_pipeline_params[True]
649
50.53s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[DagTwoEncoders-parameters8]
650
50.10s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[LinearPipelineWithImputer-parameters1]
651
49.42s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_results[False]
652
49.42s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_graph_two_way_partial_dependence
FIRST RUN
========================== slowest 25 test durations ===========================
630
133.58s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_graph_partial_dependence_multiclass
631
131.61s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_more_categories_than_grid_resolution
632
112.35s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_multiclass
633
102.16s call evalml/tests/automl_tests/test_automl.py::test_automl_best_pipeline
634
80.70s call evalml/tests/automl_tests/test_automl_dask.py::TestAutoMLSearchDask::test_automl
635
67.40s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_passes_pipeline_params[False]
636
63.33s call evalml/tests/automl_tests/test_automl.py::test_max_batches_works[regression-False-20]
637
62.44s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[multiclass]
638
61.97s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[regression]
639
61.42s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[binary]
640
57.55s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[LinearPipelineTwoEncoders-parameters3]
641
57.42s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[DagReuseFeatures-parameters9]
642
55.39s call evalml/tests/model_understanding_tests/test_graphs.py::test_jupyter_graph_check
643
54.22s call evalml/tests/automl_tests/test_automl.py::test_max_batches_works[binary-False-20]
644
53.05s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_passes_pipeline_params[True]
645
50.63s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_results[True]
646
49.44s call evalml/tests/model_understanding_tests/test_graphs.py::test_cost_benefit_matrix_vs_threshold[np]
647
49.14s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_results[False]
648
49.13s call evalml/tests/automl_tests/test_automl.py::test_automl_ensembling_false
649
48.80s call evalml/tests/model_understanding_tests/test_graphs.py::test_binary_objective_vs_threshold[np]
650
48.41s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[DagTwoEncoders-parameters8]
651
47.64s call evalml/tests/automl_tests/test_automl_dask.py::TestAutoMLSearchDask::test_automl_max_iterations
652
47.55s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[LinearPipelineWithTextFeatures-parameters4]
653
47.39s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[LinearPipelineWithImputer-parameters1]
654
47.04s call evalml/tests/automl_tests/test_automl.py::test_max_batches_works[regression-True-20]
SECOND RUN
========================== slowest 25 test durations ===========================
630
117.78s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_more_categories_than_grid_resolution
631
114.98s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_graph_partial_dependence_multiclass
632
101.67s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_multiclass
633
87.55s call evalml/tests/automl_tests/test_automl.py::test_automl_best_pipeline
634
65.42s call evalml/tests/automl_tests/test_automl_dask.py::TestAutoMLSearchDask::test_automl
635
57.63s call evalml/tests/automl_tests/test_automl.py::test_max_batches_works[regression-False-20]
636
57.36s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_passes_pipeline_params[False]
637
53.32s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[binary]
638
53.08s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[multiclass]
639
52.75s call evalml/tests/model_understanding_tests/test_partial_dependence.py::test_partial_dependence_datetime[regression]
640
50.27s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[DagReuseFeatures-parameters9]
641
48.90s call evalml/tests/model_understanding_tests/test_graphs.py::test_jupyter_graph_check
642
47.88s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[LinearPipelineTwoEncoders-parameters3]
643
47.15s call evalml/tests/automl_tests/test_automl.py::test_max_batches_works[binary-False-20]
644
46.17s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_passes_pipeline_params[True]
645
44.58s call evalml/tests/model_understanding_tests/test_graphs.py::test_cost_benefit_matrix_vs_threshold[np]
646
44.30s call evalml/tests/automl_tests/test_automl.py::test_automl_ensembling_false
647
43.54s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_results[True]
648
42.98s call evalml/tests/automl_tests/test_iterative_algorithm.py::test_iterative_algorithm_results[False]
649
42.38s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[DagTwoEncoders-parameters8]
650
42.18s call evalml/tests/automl_tests/test_automl.py::test_max_batches_works[regression-True-20]
651
41.51s call evalml/tests/model_understanding_tests/test_graphs.py::test_binary_objective_vs_threshold[np]
652
40.86s call evalml/tests/model_understanding_tests/test_graphs.py::test_cost_benefit_matrix_vs_threshold[pd]
653
40.20s call evalml/tests/model_understanding_tests/test_permutation_importance.py::test_fast_permutation_importance_matches_sklearn_output[LinearPipelineWithImputer-parameters1]
654
39.42s call evalml/tests/automl_tests/test_automl_dask.py::TestAutoMLSearchDask::test_automl_max_iterations
Although the windows unit tests are slower in general, we see the same unit tests take the longest for both windows and linux (test automl with dask, partial dependence/permutation importance, some iterative algorithms tests).
These are some next steps I think we should do based on this:
test_graph_partial_dependence_multiclass
takes 40 seconds on my laptop as opposed to ~120 on the workers-n 2
the tests will run faster?
This spike is intended to track the running of the unit tests. Not sure whether it's worth tracking linux vs. windows separately here. Unit tests are currently hitting ~20 minutes to complete locally and on CircleCI checks. Even though we're moving to GitHub actions, that shouldn't really make the problem any better or any worse.
The intended outcome of this spike is the following: