catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
488 stars 111 forks source link

Speed up unit tests #3097

Open bendnorman opened 11 months ago

bendnorman commented 11 months ago

Our unit tests take almost a minute to complete and my little distracto brain loses track of what it's working on when running pre-commits. Only three unit tests consume almost 74% of our unit test run time:

============================= slowest 20 durations =============================
19.83s call     test/unit/settings_test.py::test_partitions_for_datasource_table
12.29s call     test/unit/io_managers_test.py::test_filter_for_freshest_data
7.11s call     test/unit/io_managers_test.py::test_migrations_match_metadata
0.65s call     test/unit/analysis/plant_parts_eia_test.py::test_one_to_many
0.39s call     test/unit/helpers_test.py::test_convert_df_to_excel_file
0.38s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_dfo_ratios_match[extra_esc_in_gf]
0.38s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_sums_match[base_case]
0.38s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_dfo_ratios_match[base_case]
0.38s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_sums_match[extra_esc_in_gf]
0.37s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_sums_match[extra_pm_in_bf]
0.32s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_by_generator_drops_pm_data
0.30s call     test/unit/io_managers_test.py::test_pudl_sqlite_io_manager_delete_stmt
0.29s call     test/unit/io_managers_test.py::test_sqlite_io_manager_delete_stmt
0.23s call     test/unit/helpers_test.py::test_sql_asset_factory_missing_file
0.16s call     test/unit/io_managers_test.py::test_foreign_key_failure
0.14s call     test/unit/analysis/spatial_test.py::test_overlay
0.13s call     test/unit/transform/glue_test.py::test_epacamd_eia_subplant_ids
0.11s call     test/unit/analysis/timeseries_cleaning_test.py::test_flags_and_imputes_anomalies[7088438834-382046123]
0.11s call     test/unit/analysis/timeseries_cleaning_test.py::test_flags_and_imputes_anomalies[11357816[575](https://github.com/catalyst-cooperative/pudl/actions/runs/7038274377/job/19154847410#step:7:576)-18413484987]
0.11s call     test/unit/analysis/timeseries_cleaning_test.py::test_flags_and_imputes_anomalies[16662093832-741013840]
=========== 280 passed, 1 skipped, 9 xfailed, 10 warnings in 53.35s ============

I wonder if these could be refactored or just added to our integration tests.

bendnorman commented 2 months ago

Our unit tests are back at 53s for me. There are now four tests that consume 50% of the unit test time:

=============================================================================================== slowest 20 durations ===============================================================================================
8.19s call     test/unit/io_managers_test.py::test_filter_for_freshest_data
7.24s call     test/unit/settings_test.py::test_partitions_for_datasource_table
7.07s call     test/unit/io_managers_test.py::test_migrations_match_metadata
5.14s call     test/unit/analysis/ml_tools_test.py::test_create_experiment_tracker[True-test_run]
0.90s call     test/unit/analysis/plant_parts_eia_test.py::test_one_to_many
0.67s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_sums_match[extra_esc_in_gf]
0.63s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_dfo_ratios_match[extra_esc_in_gf]
0.54s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_sums_match[base_case]
0.54s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_dfo_ratios_match[base_case]
0.53s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_sums_match[extra_pm_in_bf]
0.50s call     test/unit/extract/xbrl_test.py::test_xbrl2sqlite[settings1-forms1]
0.50s call     test/unit/analysis/allocate_gen_fuel_test.py::test_allocate_gen_fuel_by_generator_drops_pm_data
0.49s call     test/unit/io_managers_test.py::test_pudl_sqlite_io_manager_delete_stmt
0.40s call     test/unit/metadata_test.py::test_resource_descriptors_valid
0.40s call     test/unit/io_managers_test.py::test_sqlite_io_manager_delete_stmt
0.38s call     test/unit/output/ferc1_test.py::TestTagPropagation::test_prop_no_tags
0.37s call     src/pudl/analysis/ml_tools/experiment_tracking.py::pudl.analysis.ml_tools.experiment_tracking._flatten_model_config
0.33s call     test/unit/helpers_test.py::test_convert_df_to_excel_file
0.28s call     test/unit/metadata_test.py::test_get_sorted_resources
0.25s call     test/unit/extract/xbrl_test.py::test_xbrl2sqlite[settings0-forms0]
============================================================================ 1646 passed, 1 skipped, 9 xfailed, 327 warnings in 53.03s =============================================================================

I'll open a PR to skip the slow tests when running pre commits locally.