currently test_all_forecasters takes 45 minutes on my machine (down from over an hour after https://github.com/aeon-toolkit/aeon/pull/568). I think two further things to look at are
Splitting tests that test all combinations into single tests and
Combining tests to reduce the number of calls to fit (as described in #418 for all estimators)
list of tests in test_all_forecasters
Describe your proposed solution
Tests in test all forecasters. There are a lot of todo, which have probably been there a long time.
test_get_fitted_params: fits all estimators, tests if get_fitted_params returns a dict. This should be done in test_all_estimators and combined with another test
test_raises_not_fitted_error: calls forecasting specific update and update_predict. In principle fine, but also creates a cv sliding window
test_y_multivariate_raises_error
seems fine, but not sure why it has this tagged at the end after fit has been called, it seems pointless to me, being directly before the end of the function
if estimator_instance.get_tag("scitype:y") in ["univariate", "both"]:
# this should pass since "both" allows any number of variables
# and "univariate" automatically vectorizes, behaves multivariate
pass
test_y_invalid_type_raises_error: Test that invalid y input types raise error.
tested on two invalid types, which seems excessive
INVALID_y_INPUT_TYPES = [list("bar"), tuple()]
test_X_invalid_type_raises_error. Test that invalid X input types raise error.
tested on two invalid types, which seems excessive
INVALID_X_INPUT_TYPES = [list("foo"), tuple()]
test_predict_residuals: Check that predict_residuals method works as expected.
@pytest.mark.parametrize(
"index_fh_comb", VALID_INDEX_FH_COMBINATIONS, ids=index_fh_comb_names
)
@pytest.mark.parametrize("fh_int", TEST_FHS, ids=[f"fh={fh}" for fh in TEST_FHS])
this is a slow one I think, tested on a lot of combinations (cant quite work out how many!). It seems to test correct indexes returned
_assert_correct_pred_time_index
it calls fit, predict and predict_residuals
test_predict_time_index_with_X: Check that predicted time index matches forecasting horizon. Called 27 times per forecaster (the 9 valid index FH and 3 others), approx 700 times in total
another big one, using the nine VALID_INDEX_FH_COMBINATIONS in combination with other things
@pytest.mark.parametrize(
"index_fh_comb", VALID_INDEX_FH_COMBINATIONS, ids=index_fh_comb_names
)
@pytest.mark.parametrize(
"fh_int_oos", TEST_OOS_FHS, ids=[f"fh={fh}" for fh in TEST_OOS_FHS]
)
it checks _assert_correct_pred_time_index and _assert_correct_columns
test_predict_time_index_in_sample_full: "Check that predicted time index equals fh for full in-sample predictions". Called a whopping 54 times per estimator, (the 9 valid index FH and 6 others), 1650 times in total
@pytest.mark.parametrize(
"index_fh_comb", VALID_INDEX_FH_COMBINATIONS, ids=index_fh_comb_names
)
so could in principle be subsumed with another test
test_predict_interval: "Check prediction intervals returned by predict
Another slow test, 24 per estimator
@pytest.mark.parametrize(
"coverage", TEST_ALPHAS, ids=[f"alpha={a}" for a in TEST_ALPHAS]
)
@pytest.mark.parametrize(
"fh_int_oos", TEST_OOS_FHS, ids=[f"fh={fh}" for fh in TEST_OOS_FHS]
)
test_predict_quantiles: Check prediction quantiles returned by predict Called 27 times per forecaster (the 9 valid index FH and 3 others), approx 700 times in total
@pytest.mark.parametrize(
"alpha", TEST_ALPHAS, ids=[f"alpha={a}" for a in TEST_ALPHAS]
)
@pytest.mark.parametrize(
"fh_int_oos", TEST_OOS_FHS, ids=[f"fh={fh}" for fh in TEST_OOS_FHS]
)
calls predict_quantiles compares output to the rather confusing _check_predict_quantiles
test_pred_int_tag: Checks whether the capability:pred_int tag is correctly set
no paras, seems to test if a method has been implemented.
test_score: Check score method
@pytest.mark.parametrize(
"fh_int_oos", TEST_OOS_FHS, ids=[f"fh={fh}" for fh in TEST_OOS_FHS]
)
is it necessary to test so many combinations?
test_update_predict_single
@pytest.mark.parametrize(
"fh_int_oos", TEST_OOS_FHS, ids=[f"fh={fh}" for fh in TEST_OOS_FHS]
)
can combine with 12?
test_update_predict_predicted_index: Check predicted index in update_predict. Called a massive 72 times per estimator, over 2000 times in total
@pytest.mark.parametrize(
"fh_int_oos", TEST_OOS_FHS, ids=[f"fh={fh}" for fh in TEST_OOS_FHS]
)
Describe the feature or idea you want to propose
currently test_all_forecasters takes 45 minutes on my machine (down from over an hour after https://github.com/aeon-toolkit/aeon/pull/568). I think two further things to look at are
list of tests in test_all_forecasters
Describe your proposed solution
Tests in test all forecasters. There are a lot of todo, which have probably been there a long time.
test_get_fitted_params: fits all estimators, tests if get_fitted_params returns a dict. This should be done in test_all_estimators and combined with another test
test_raises_not_fitted_error: calls forecasting specific update and update_predict. In principle fine, but also creates a cv sliding window
not sure if this is required
test_y_multivariate_raises_error seems fine, but not sure why it has this tagged at the end after fit has been called, it seems pointless to me, being directly before the end of the function
test_y_invalid_type_raises_error: Test that invalid y input types raise error. tested on two invalid types, which seems excessive
test_X_invalid_type_raises_error. Test that invalid X input types raise error. tested on two invalid types, which seems excessive
test_predict_residuals: Check that predict_residuals method works as expected.
this is a slow one I think, tested on a lot of combinations (cant quite work out how many!). It seems to test correct indexes returned _assert_correct_pred_time_index it calls fit, predict and predict_residuals
test_predict_time_index_with_X: Check that predicted time index matches forecasting horizon. Called 27 times per forecaster (the 9 valid index FH and 3 others), approx 700 times in total
another big one, using the nine VALID_INDEX_FH_COMBINATIONS in combination with other things
it checks _assert_correct_pred_time_index and _assert_correct_columns
test_predict_time_index_in_sample_full: "Check that predicted time index equals fh for full in-sample predictions". Called a whopping 54 times per estimator, (the 9 valid index FH and 6 others), 1650 times in total
test_predict_interval: "Check prediction intervals returned by predict Another slow test, 24 per estimator
seems quite a weird test
test_predict_quantiles: Check prediction quantiles returned by predict Called 27 times per forecaster (the 9 valid index FH and 3 others), approx 700 times in total
calls predict_quantiles compares output to the rather confusing _check_predict_quantiles
test_pred_int_tag: Checks whether the capability:pred_int tag is correctly set no paras, seems to test if a method has been implemented.
test_score: Check score method
is it necessary to test so many combinations? test_update_predict_single
can combine with 12?
test_update_predict_predicted_index: Check predicted index in update_predict. Called a massive 72 times per estimator, over 2000 times in total
these functions have multiple arguments, e.g.
not sure where these are set
test__y_and_cutoff: Check cutoff and _y. (not the most informative comment)
where is n_columns set?
test__y_when_refitting: Test that _y is updated when forecaster is refitted
combine with 14?
surely the two calls to predict is redundant?
test_fh_not_passed_error_handling. Check that not passing fh in fit/predict raises correct error. seems fine
test_different_fh_in_fit_and_predict_error_handling: Check that fh different in fit and predict raises correct error. combine with 17?
test_hierarchical_with_exogeneous: Check that hierarchical forecasting works, also see bug #3961. wonder what bug #3961 is?
Describe alternatives you've considered, if relevant
No response
Additional context
No response