aimclub / FEDOT

Automated modeling and machine learning framework FEDOT
https://fedot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
619 stars 84 forks source link

[Bug]: The truth value of a DataFrame is ambiguous. #1298

Open DRMPN opened 1 month ago

DRMPN commented 1 month ago

Expected Behavior

Pipeline starts tuning with provided input data.

tuned_pipiline = auto_model.tune(input_data=orig_data, timeout=10, cv_folds=10, n_jobs=4)

Current Behavior

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 1
----> 1 tuned_pipiline = auto_model.tune(input_data=train, timeout=10, cv_folds=10, n_jobs=4)

File [c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\main.py:230](file:///C:/Users/nnikitin-user/AppData/Local/Programs/Python/Python310/lib/site-packages/fedot/api/main.py:230), in Fedot.tune(self, input_data, metric_name, iterations, timeout, cv_folds, n_jobs, show_progress)
    227     raise ValueError(NOT_FITTED_ERR_MSG)
    229 with fedot_composer_timer.launch_tuning('post'):
--> 230     if not input_data: 
    231         input_data = self.train_data
    232     cv_folds = cv_folds or self.params.get('cv_folds')

File [c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py:1527](file:///C:/Users/nnikitin-user/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/generic.py:1527), in NDFrame.__nonzero__(self)
   1525 @final
   1526 def __nonzero__(self) -> NoReturn:
-> 1527     raise ValueError(
   1528         f"The truth value of a {type(self).__name__} is ambiguous. "
   1529         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1530     )

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Possible Solution

Change line 230 in fedot/api/main.py to the following:

if not input_data: 
    input_data = self.train_data

Steps to Reproduce

Data from https://www.kaggle.com/competitions/playground-series-s4e6

from fedot.api.main import Fedot
import pandas as pd

train = pd.read_csv("/automl-june/playground-series-s4e6/train.csv")
test = pd.read_csv("/automl-june/playground-series-s4e6/test.csv")

train.drop(columns=["id"], inplace=True)
test.drop(columns=["id"], inplace=True)

auto_model = Fedot(
    problem="classification",
    metric=["precision", "accuracy", "roc_auc"],
    preset="best_quality",
    with_tuning=True,
    timeout=60,
    cv_folds=10,
    seed=42,
    n_jobs=1,
    logging_level=10,
    use_pipelines_cache=False,
    use_auto_preprocessing=False,
)

auto_model.fit(features=train, target="Target")

prediction = auto_model.predict(features=test, save_predictions=True)

print(auto_model.return_report().head(10))

print(auto_model.get_metrics(target=train.Target))

tuned_pipiline = auto_model.tune(input_data=train, timeout=10, cv_folds=10, n_jobs=4)

Context [OPTIONAL]

Participating in a Kaggle competition PS4E6.