fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://fugue-tutorials.readthedocs.io/
Apache License 2.0
1.92k stars 94 forks source link

[BUG] Lambda function for passing data frame to the setup function in pycaret doesn't work #527

Closed rcshetty3 closed 6 months ago

rcshetty3 commented 7 months ago

Minimal Code To Reproduce

from pycaret.classification import *

setup(data=lambda: get_data("juice", verbose=False, profile=False), target = 'Purchase', session_id=0, n_jobs=1);

Error message - 1 from pycaret.classification import * ----> 3 setup(data=lambda: get_data("juice", verbose=False, profile=False), target = 'Purchase', session_id=0, n_jobs=1)

File c:\ProgramData\anaconda3\envs\pycaretenv\lib\site-packages\pycaret\classification\functional.py:595, in setup(data, data_func, target, index, train_size, test_data, ordinal_features, numeric_features, categorical_features, date_features, text_features, ignore_features, keep_features, preprocess, create_date_columns, imputation_type, numeric_imputation, categorical_imputation, iterative_imputation_iters, numeric_iterative_imputer, categorical_iterative_imputer, text_features_method, max_encoding_ohe, encoding_method, rare_to_value, rare_value, polynomial_features, polynomial_degree, low_variance_threshold, group_features, drop_groups, remove_multicollinearity, multicollinearity_threshold, bin_numeric_features, remove_outliers, outliers_method, outliers_threshold, fix_imbalance, fix_imbalance_method, transformation, transformation_method, normalize, normalize_method, pca, pca_method, pca_components, feature_selection, feature_selection_method, feature_selection_estimator, n_features_to_select, custom_pipeline, custom_pipeline_position, data_split_shuffle, data_split_stratify, fold_strategy, fold, fold_shuffle, fold_groups, n_jobs, use_gpu, html, session_id, system_log, log_experiment, experiment_name, experiment_custom_tags, log_plots, log_profile, log_data, verbose, memory, profile, profile_kwargs) 593 exp = _EXPERIMENT_CLASS() 594 set_current_experiment(exp) --> 595 return exp.setup( 596 data=data, 597 data_func=data_func, 598 target=target, 599 index=index, 600 train_size=train_size, 601 test_data=test_data, 602 ordinal_features=ordinal_features, 603 numeric_features=numeric_features, 604 categorical_features=categorical_features, 605 date_features=date_features, 606 text_features=text_features, 607 ignore_features=ignore_features, 608 keep_features=keep_features, 609 preprocess=preprocess, 610 create_date_columns=create_date_columns, ... 93 if data is not None: 94 if not isinstance(data, pd.DataFrame): 95 # Assign default column names (dict already has column names)

TypeError: 'function' object is not subscriptable Describe the bug A clear and concise description of what the bug is.

Lambda function for passing data frame to the setup function in pycaret doesn't work

Expected behavior A clear and concise description of what you expected to happen.

The setup function should succeed

Environment (please complete the following information):

kvnkho commented 7 months ago

Hi @rcshetty3 , this doesn't seem like a Fugue issue right? It seems like this is pure pycaret code and the pycaret setup() function. Looking at their code, setup() can take both a data or data_func. See this.

Maybe you can try:

setup(data_func=lambda: get_data("juice", verbose=False, profile=False), target = 'Purchase', session_id=0, n_jobs=1);

I think this should be posted in the pycaret Github though.