aimclub / FEDOT

Automated modeling and machine learning framework FEDOT
https://fedot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
619 stars 84 forks source link

[Bug]: ValueError: [...] the array at index 0 has size 894365 and the array at index 1 has size 1117957 #1296

Open DRMPN opened 1 month ago

DRMPN commented 1 month ago

Expected Behavior

Auto preprocessing should work correctly. Pipeline should be fitted.

Current Behavior

FEDOT fails to fit catboostreg model with use_auto_preprocessing=True option.

PS C:\Users\nnikitin-user\Desktop\automl_may> & C:/Users/nnikitin-user/AppData/Local/Programs/Python/Python310/python.exe c:/Users/nnikitin-user/Desktop/automl_may/flood_1.py
2024-05-16 13:16:58,812 - ApiDataProcessor - Preprocessing data
2024-05-16 13:16:58,812 - ApiDataProcessor - Train Data (Original) Memory Usage: 452.05 MB Data Shapes: ((1117957, 53), (1117957, 1))
2024-05-16 13:22:54,236 - ApiDataProcessor - Train Data (Processed) Memory Usage: 1.05 GB Data Shape: ((1117957, 126), (1117957, 1))
2024-05-16 13:22:54,236 - ApiDataProcessor - Data preprocessing runtime = 0:05:55.423210
2024-05-16 13:22:55,149 - AssumptionsHandler - Initial pipeline fitting started
2024-05-16 13:23:21,260 - PipelineNode - Trying to fit pipeline node with operation: catboostreg
2024-05-16 13:23:22,181 - AssumptionsHandler - Initial pipeline fit was failed due to: all the input array dimensions except for the concatenation axis must match exactly, but along dimension
0, the array at index 0 has size 894365 and the array at index 1 has size 1117957.
Traceback (most recent call last):
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 71, in fit_assumption_and_check_correctness
    pipeline.fit(data_train, n_jobs=eval_n_jobs)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 197, in fit
    train_predicted = self._fit(input_data=copied_input_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 112, in _fit
    train_predicted = self.root_node.fit(input_data=input_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\node.py", line 200, in fit
    self.fitted_operation, operation_predict = self.operation.fit(params=self._parameters,
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\operation.py", line 87, in fit
    self.fitted_operation = self._eval_strategy.fit(train_data=data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\boostings.py", line 33, in fit
    operation_implementation.fit(train_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\operation_implementations\models\boostings_implementations.py", line 28, in fit
    input_data = input_data.get_not_encoded_data()
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\data\data.py", line 628, in get_not_encoded_data
    new_features = np.hstack((num_features, cat_features))
  File "<__array_function__ internals>", line 200, in hstack
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\core\shape_base.py", line 370, in hstack
    return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 894365 and the array at index 1 has size 1117957

Traceback (most recent call last):
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 71, in fit_assumption_and_check_correctness
    pipeline.fit(data_train, n_jobs=eval_n_jobs)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 197, in fit
    train_predicted = self._fit(input_data=copied_input_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 112, in _fit
    train_predicted = self.root_node.fit(input_data=input_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\node.py", line 200, in fit
    self.fitted_operation, operation_predict = self.operation.fit(params=self._parameters,
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\operation.py", line 87, in fit
    self.fitted_operation = self._eval_strategy.fit(train_data=data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\boostings.py", line 33, in fit
    operation_implementation.fit(train_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\operation_implementations\models\boostings_implementations.py", line 28, in fit
    input_data = input_data.get_not_encoded_data()
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\data\data.py", line 628, in get_not_encoded_data
    new_features = np.hstack((num_features, cat_features))
  File "<__array_function__ internals>", line 200, in hstack
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\core\shape_base.py", line 370, in hstack
    return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 894365 and the array at index 1 has size 1117957

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\nnikitin-user\Desktop\automl_may\flood_1.py", line 85, in <module>
    auto_model.fit(features=train, target="FloodProbability")
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\main.py", line 181, in fit
    self.current_pipeline, self.best_models, self.history = self.api_composer.obtain_model(self.train_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\api_composer.py", line 63, in obtain_model
    initial_assumption, fitted_assumption = self.propose_and_fit_initial_assumption(train_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\api_composer.py", line 107, in propose_and_fit_initial_assumption    assumption_handler.fit_assumption_and_check_correctness(deepcopy(initial_assumption[0]),
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 86, in fit_assumption_and_check_correctness
    self._raise_evaluating_exception(ex)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 94, in _raise_evaluating_exception    raise ValueError(advice_info)
ValueError: Initial pipeline fit was failed due to: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 894365 and the array at index 1 has size 1117957. Check pipeline structure and the correctness of the data
PS C:\Users\nnikitin-user\Desktop\automl_may>

Possible Solution

Some features are deleted during the auto preprocessing. Perhaps it is related to categorical features. Debug the following breakpoints to find and fix the problem. image

Steps to Reproduce

  1. Download code and data from https://www.kaggle.com/code/eliyahusanti/fedot-nss-lab-automl-catboost-0-8676
  2. Set FEDOT parameter use_auto_preprocessing=True
  3. Run the code

Context [OPTIONAL]

Participating in a Kaggle competition.