Closed simonprovost closed 11 months ago
Hi @simonprovost thanks for the information.
How are inactive hyperparameters in the dataset provided to the random forest's surrogate in SMAC managed precisely? Are they executed as described above? Following this Github issue, would you be open to a PR in the FAQ explaining how they are handled?
The main reason that we impute NaN
value for RF is that our RF surrogate model is built based on the pyrfr package written in C++ and wrapped by swig.
NaN
values might not be easily transferred to the corresponding C++ variants in swig. Therefore we need to impute those values.
Why is the number of options the choices offer is used to represent inactive categorical hyperparameters? What is the logic behind this decision?
In ConfigSpace, categorical HPs are encoded as Numerical Encoding ([0, 1, 2, ... n_opts - 1]), as shown in this line. Therefore, a categorical HP will never select a HP whose vector value is n_opts
if it is an activate HP
Why is -1 used for inactive float/integer hyperparameters, and what effect does this decision have on the model? Is -1 not regarded as one of the options? Or, as I have observed elsewhere, float/integer hyperparameters are rescaled? If so, could you please provide further explanation for this type of inactive hyperparameters?
Similar to categorical HPs, Numerical HPs (float & int) are represented as vectors within [0,1] (This normalization method is also used in GP models). Therefore, they will never select -1
if the HPs are activate
Were there any considerations for modifying the decision tree splitting criteria to manage inactive hyperparameters based on a flag or other manners, as opposed to using placeholders to falsify the decision tree with almost to no information gain for these hyperparameters?
As soon as our surrogate models are based on pyrfr, this might not be easily implemented.
Would you confirm that following https://github.com/scikit-learn/scikit-learn/pull/23595is not going to influence with these potential inactive hyperparameters that can be actually seen as missing values ? Given your extra layer of missing value imputation, I reckon this to be not an issue, yet always great to extra confirm.
We also considering reimplementing our RF models based on SKLearn's models, however, I cannot promise the exact time that would happen.,
Is there a method to print the input data given to the surrogate using the API? In order for us to have a visual interpretation. If not, could we be directed to a good starting point for printing in the code following a fork of SMAC?
for a configuration, you can simply call config.get_array()
to get its numerical representation.
Hopes that answers all your questions
Hi @dengdifan,
I greatly appreciate your detailed response; it has helped clarify numerous aspects. I understand that placeholder values outside the range of their active counterparts are assigned to inactive hyperparameters to prevent them from significantly influencing the surrogate model.
From the discussion, it appears that the placeholder values are not likely to be selected or to lead to meaningful splits in the decision trees of the surrogate model. E.g., Due to the uniformity and lack of correlation between these placeholders and the target values, the information gain from splitting on these values is typically low, particularly when working with a configuration-based dataset that is densely populated.
Nonetheless, I am curious about two things:
In the meantime, thank you again for your insights, and I eagerly await your response to the final question. Additionally, future reader should find it useful for comprehending and possibly enhancing the handling of inactive hyperparameters in SMAC. Therefore, I used the following snippet beginning with their unit testing if that could help visually visualise the imputed missing values that the surrogate RF of SMAC do, so that you can see roughly how it is done, although this is a very simplified example:
# Import the necessary here (ConfigSpace, SMAC, Rich, etc.)
def display_hyperparameter_configurations(size=10):
def convert_configurations_to_array(configs):
return np.array([config.get_array() for config in configs])
# Define the configuration space
cs = ConfigurationSpace(seed=0)
# Algorithm hyperparameter
algorithm = cs.add_hyperparameter(CategoricalHyperparameter("algorithm", ["decision_tree", "random_forest"]))
# Decision Tree hyperparameters
criterion = cs.add_hyperparameter(CategoricalHyperparameter("criterion", ["gini", "entropy"]))
max_depth = cs.add_hyperparameter(UniformIntegerHyperparameter("max_depth", 1, 20))
# Conditions for Decision Tree hyperparameters
cs.add_condition(EqualsCondition(criterion, algorithm, "decision_tree"))
cs.add_condition(EqualsCondition(max_depth, algorithm, "decision_tree"))
# Random Forest hyperparameters
n_estimators = cs.add_hyperparameter(UniformIntegerHyperparameter("n_estimators", 10, 200))
max_features = cs.add_hyperparameter(CategoricalHyperparameter("max_features", ["auto", "sqrt", "log2"]))
# Conditions for Random Forest hyperparameters
cs.add_condition(EqualsCondition(n_estimators, algorithm, "random_forest"))
cs.add_condition(EqualsCondition(max_features, algorithm, "random_forest"))
# Sample configurations
configs = cs.sample_configuration(size=size)
config_array = convert_configurations_to_array(configs)
model = RandomForest(configspace=cs)
config_array = model._impute_inactive(config_array)
hp_names = [hp.name for hp in cs.get_hyperparameters()]
console = Console()
table = Table(show_header=True, header_style="bold magenta")
for name in hp_names:
table.add_column(name)
for config in config_array:
table.add_row(*map(str, config))
console.print(table)
# Call the function to display the configurations
display_hyperparameter_configurations(size=50)
Following your answer @dengdifan , consider this issue done 👍 Cheers.
Given the non high priorities of the remaining queries, I'll close to let other more important query to pass first. Please feel free to reopen if you have time or if any reader wishes to learn more about the two most recent questions posed.
Cheers!
I would like to express my appreciation for the outstanding work done on the new iteration of the SMAC framework. It is a useful framework to use. However, I would like more information regarding the management of inactive hyperparameters when input data is provided to the random forest regression surrogate model during the SMAC procedure.
Description
I have a year ago obviously encountered the concept of inactive hyperparameters while perusing numerous papers and engaging in discussions around. By inactive hyperparameters, while this is a popular name for it, let's be precise, I mean hyperparameters that are not conditional on specific configurations. For instance, while the decision tree and random forest algorithms share some hyperparameters, the
n_estimators
hyperparameter is unique to the random forest algorithm; therefore, the decision tree algorithm should never be sampled nor used by the surrogate in the SMAC's process.While I am certain that SMAC do so as is, I am, however, much more perplexe as to how SMAC manages these inactive hyperparameters. My current understanding is that SMAC do so in a manner that leaves their (inactive hyperparameters) imputation to the surrogate model itself given it could be managed differently by Gaussian processes than Random forest for instance. Consequently, the surrogate model, a regression random forest, as of interest, receives input data in which rows represent configurations whose cost values are known and columns represent hyperparameters from the search space. If I am not mistaken, inactive hyperparameters for specified configurations appear to be pre-represented (prior to run anything per the surrogate model) by specific placeholder values such as
NaN
.As a result, the surrogate itself, in this case the random forest regression, I have observed that categorical hyperparameters use the length of the possible choices as per the hyperparameter as the placeholder, whereas float/integer hyperparameters use
-1
and constant1
respectively.Questions:
Most important:
-1
used for inactive float/integer hyperparameters, and what effect does this decision have on the model? Is-1
not regarded as one of the options? Or, as I have observed elsewhere, float/integer hyperparameters are rescaled? If so, could you please provide further explanation for this type of inactive hyperparameters?Less important:
placeholders
to falsify the decision tree with almost to no information gain for these hyperparameters?missing values
? Given your extra layer of missing value imputation, I reckon this to be not an issue, yet always great to extra confirm.Steps/Code to Reproduce
_Note that my initial confusion is this docstring that says
Impute inactive hyperparameters in configurations with their default
yet the code is not inputing as per my understanding, rather it just is filling out a configuration(i.e, list) of the hyperparameters:return np.array([config.get_array() for config in configs], dtype=np.float64)
_Versions