aimclub / FEDOT

Automated modeling and machine learning framework FEDOT
https://fedot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
627 stars 86 forks source link

Default parameters for RANSACRegressor not being initialized #388

Closed bacalfa closed 3 years ago

bacalfa commented 3 years ago

I'm testing the AutoML approach with Fedot on a dataset with 11 rows and 66 columns (never mind the many more columns than rows in this case). The default parameters for the strategy (ransac_non_lin_reg) aren't being initialized. For example, min_samples is None. And I'm getting an error:

Traceback (most recent call last):
  ....
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\api\main.py", line 206, in fit
    return self._obtain_model(is_composing_required)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\api\main.py", line 156, in _obtain_model
    self.current_model, self.best_models, self.history = compose_fedot_model(**execution_params)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\api\api_utils.py", line 190, in compose_fedot_model
    chain_gp_composed = gp_composer.compose_chain(data=train_data)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\composer\gp_composer\gp_composer.py", line 140, in compose_chain
    best_chain = self.optimiser.optimise(metric_function_for_nodes,
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\composer\optimisers\gp_comp\param_free_gp_optimiser.py", line 123, in optimise
    self._evaluate_individuals(new_population, objective_function, timer=t)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\composer\optimisers\gp_comp\gp_optimiser.py", line 383, in _evaluate_individuals
    evaluate_individuals(individuals_set=individuals_set, objective_function=objective_function, timer=timer,
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\composer\optimisers\gp_comp\gp_operators.py", line 85, in evaluate_individuals
    ind.fitness = calculate_objective(ind.chain, objective_function, is_multi_objective)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\composer\optimisers\gp_comp\gp_operators.py", line 100, in calculate_objective
    calculated_fitness = objective_function(ind)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\operations\cross_validation.py", line 20, in cross_validation
    chain.fit(train_data)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\chains\chain.py", line 173, in fit
    train_predicted = self._fit(input_data=copied_input_data, use_cache=use_cache)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\chains\chain.py", line 142, in _fit
    train_predicted = self.root_node.fit(input_data=input_data)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\chains\node.py", line 241, in fit
    secondary_input = self._input_from_parents(input_data=input_data, parent_operation='fit')
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\chains\node.py", line 272, in _input_from_parents
    parent_results, target = _combine_parents(parent_nodes, input_data,
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\chains\node.py", line 303, in _combine_parents
    prediction = parent.fit(input_data=input_data)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\chains\node.py", line 241, in fit
    secondary_input = self._input_from_parents(input_data=input_data, parent_operation='fit')
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\chains\node.py", line 272, in _input_from_parents
    parent_results, target = _combine_parents(parent_nodes, input_data,
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\chains\node.py", line 303, in _combine_parents
    prediction = parent.fit(input_data=input_data)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\chains\node.py", line 174, in fit
    return super().fit(input_data)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\chains\node.py", line 96, in fit
    self.fitted_operation, operation_predict = self.operation.fit(data=input_data,
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\operations\operation.py", line 86, in fit
    fitted_operation = self._eval_strategy.fit(train_data=data)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\operations\evaluation\regression.py", line 65, in fit
    operation_implementation.fit(train_data)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\fedot\core\operations\evaluation\operation_implementations\data_operations\sklearn_filters.py", line 25, in fit
    self.operation.fit(input_data.features, input_data.target)
  File "C:\Users\username\Miniconda3\envs\myenv\lib\site-packages\sklearn\linear_model\_ransac.py", line 281, in fit
    raise ValueError("`min_samples` may not be larger than number "
ValueError: `min_samples` may not be larger than number of samples: n_samples = 11.

In this case, min_samples is much larger than the number of rows, and RANSACRegressor doesn't allow that. How do I make sure min_samples is properly initialized?

Dreamlone commented 3 years ago

Dear Bruno, hi! Thank you very much for your message Our team and I also just recently discovered this error and have already corrected it. This bug should no longer be in the master branch.

Check this closed pull request for more information As proof of this, you can check this unit test test_ransac_with_invalid_params_fit_correctly, which checks the occurrence of this error

A brief explanation: FEDOT in its structure has a json file that stores default hyperparameter values for different operations. We added such parameters for the RANSAC algorithm, where the value of the hyperparameter min_samples is given as a relative number (varies from 0 to 1). For now, it's 0.4. So no matter how many features/columns and rows are in the dataset, FEDOT will now be able to adequately initialize the RANSAC operation.

We will prepare a new version of FEDOT soon, and you can use all the above changes by installing FEDOT via "pip install". But if you want to try a quicker fix, use the framework version from the master branch.

So, now it works :)

bacalfa commented 3 years ago

Thank you! I'll wait for this new version.

nicl-nno commented 3 years ago

Hi! Did new version (0.4.0) fixed this bug for your script?

bacalfa commented 3 years ago

Yes, it works now. Thank you!

I'm getting several Fit pipeline from scratch messages printed to the console. How do I avoid that?

nicl-nno commented 3 years ago

You сan set verbose_level=1 in the 'Fedot' class constructor.

nicl-nno commented 3 years ago

Issue descirbed in first post is resolved