Ensemble order not invariant with configurations order

Currently, evaluating ensemble gives different results based on the order of the presented configurations. It is unexpected to me as ensemble performance should be invariant to initial configuration orders (it is a set).

For instance, the following code evaluate two ensemble of the same configurations, but presented in different order. It outputs:

[[0.00642857]]
[[0.00714286]]

from autogluon_zeroshot.repository.evaluation_repository import load
from autogluon_zeroshot.utils.cache import cache_function
repo = cache_function(lambda: load(version="BAG_D244_F10_C608_FULL"), cache_name="repo")

configs = [
    'ExtraTrees_r19_BAG_L1',
    'LightGBM_r158_BAG_L1',
    'RandomForest_r5_BAG_L1',
    'LightGBM_r118_BAG_L1',
    'LightGBM_r97_BAG_L1',
    'LightGBM_r111_BAG_L1',
    'LightGBM_r71_BAG_L1',
    'NeuralNetFastAI_r82_BAG_L1',
    'NeuralNetFastAI_r25_BAG_L1',
    'NeuralNetFastAI_r145_BAG_L1',
    'NeuralNetFastAI_r128_BAG_L1',
    'NeuralNetFastAI_r121_BAG_L1',
    'NeuralNetFastAI_r173_BAG_L1',
    'CatBoost_r16_BAG_L1',
    'NeuralNetFastAI_r169_BAG_L1',
    'CatBoost_r42_BAG_L1',
    'CatBoost_r93_BAG_L1',
    'CatBoost_r2_BAG_L1',
    'CatBoost_r79_BAG_L1',
    'CatBoost_r57_BAG_L1'
]
common_kwargs = dict(tids=[3704], folds=[8], ensemble_size=50, rank=False)
print(repo.evaluate_ensemble(config_names=configs, **common_kwargs))
print(repo.evaluate_ensemble(config_names=list(sorted(configs)), **common_kwargs))

autogluon / tabrepo

Ensemble order not invariant with configurations order #16