alteryx / evalml

EvalML is an AutoML library written in python.
https://evalml.alteryx.com
BSD 3-Clause "New" or "Revised" License
733 stars 83 forks source link

Segmentation issue keeps crashing the kernel #4444

Open enfeizhan opened 3 weeks ago

enfeizhan commented 3 weeks ago

issue.csv The AutoMLSearch keeps crashing with the simple dataset. Running the code in terminal gives segmentation error. If run in Jupyter Notebook, the notebook crashes and gets restarted.

import pandas as pd
import evalml

fm = pd.read_csv('issue.csv')
fm.ww.init()

fm.ww.describe()

y = fm.ww.pop('label')

automl = evalml.AutoMLSearch(
    X_train=fm,
    y_train=y,
    problem_type='binary',
    random_seed=3,
    max_batches=5
)
automl.search()

The data doesn't have infinity or null values. In principle, it shouldn't crash the kernel even though it won't get an amazing model.

enfeizhan commented 3 weeks ago

The search went through once the search scope is limited to random forest and linear_model: allowed_model_families=["random_forest", "linear_model"]. Further investigation finds the problem is with lightgbm. As long as lightgbm isn't here, the search would be fine.

eccabay commented 2 weeks ago

Thanks for reporting and investigating @enfeizhan. Could you share what evalml and lightgbm versions you're running with, as well as a bit more information about your data (types, size, etc)?