autogluon / autogluon

Fast and Accurate ML in 3 Lines of Code
https://auto.gluon.ai/
Apache License 2.0
7.79k stars 910 forks source link

[BUG] ValueError: Input contains infinity or a value too large for dtype('float32') when calling predict #4347

Open TreeOfLearning opened 2 months ago

TreeOfLearning commented 2 months ago

Bug Report Checklist

Describe the bug

I am getting the error ValueError: Input contains infinity or a value too large for dtype('float32') when calling predict on a successfully trained TabularPredictor. I have triple checked that the data I am providing is sanitised in such a way that there are no values above the limit of a float32, no infinity values, and no nans. Therefore, the problem must be within the predictor somewhere.

Based on the error, it looks like it could be an issue with the y scaler. Could it be that the scaler is being fit to the training data but that then produces values too large for the predictor to actually use? If this is the case I'd expect those values to be clipped to an acceptable range rather than just failing to predict.

For what it's worth, if I use a different subset of data, I do not have this issue, so clearly there are some values causing an issue.

Expected behavior

I expect to be able to train on a subset of my data and then predict on the remaining data, and for that prediction to succeed.

To Reproduce

I can't provide the data as it is proprietary and sensitive. However, here is the code with which I am training the predictor and then calling predict:

def sanitize_dataframe(df):
    # Define the maximum value for float32
    float32_max = np.finfo(np.float32).max

    # Replace infinities and values greater than float32_max with NaN
    df = df.replace([np.inf, -np.inf], 1)
    df = df.applymap(lambda x: 1 if x > float32_max else x)

    return df

# df has been loaded from a csv and preprocessed 
X = df[feature_indices]
y = df[INDICATOR]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

train_data = X_train.copy()
train_data["target"] = y_train
test_data = X_test.copy()
test_data["target"] = y_test

metric = "rmse"
    predictor = TabularPredictor(
        label="target", eval_metric=metric, path=f"models/l1/{INDICATOR}_{fold_idx}"
    ).fit(train_data, time_limit=args.time_limit, presets="best_quality")

y_pred_ag = predictor.predict(test_data) # <- This is where the failure occurs

Screenshots / Logs

AutoGluon training complete, total runtime = 310.33s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 126.1 rows/s (37 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("models/l1/OM LOI_2")
Traceback (most recent call last):
  File "/root/data-processor/eda-l1.py", line 458, in <module>
    y_pred_ag = predictor.predict(test_data)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/tabular/predictor/predictor.py", line 2117, in predict
    return self._learner.predict(X=data, model=model, as_pandas=as_pandas, transform_features=transform_features, decision_threshold=decision_threshold)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/tabular/learner/abstract_learner.py", line 208, in predict
    y_pred_proba = self.predict_proba(
                   ^^^^^^^^^^^^^^^^^^^

  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/tabular/learner/abstract_learner.py", line 189, in predict_proba
    y_pred_proba = self.load_trainer().predict_proba(X, model=model)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/core/trainer/abstract_trainer.py", line 837, in predict_proba
    return self._predict_proba_model(X, model, cascade=cascade)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/core/trainer/abstract_trainer.py", line 2611, in _predict_proba_model
    return self.get_pred_proba_from_model(model=model, X=X, model_pred_proba_dict=model_pred_proba_dict, cascade=cascade)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/core/trainer/abstract_trainer.py", line 851, in get_pred_proba_from_model
    model_pred_proba_dict = self.get_model_pred_proba_dict(X=X, models=models, model_pred_proba_dict=model_pred_proba_dict, cascade=cascade)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1100, in get_model_pred_proba_dict
    model_pred_proba_dict[model_name] = model.predict_proba(X, **preprocess_kwargs)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/core/models/abstract/abstract_model.py", line 968, in predict_proba
    y_pred_proba = self._predict_proba_internal(X=X, normalize=normalize, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 459, in _predict_proba_internal
    y_pred_proba += model.predict_proba(X=X, preprocess_nonadaptive=False, normalize=normalize)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/core/models/abstract/abstract_model.py", line 968, in predict_proba
    y_pred_proba = self._predict_proba_internal(X=X, normalize=normalize, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/core/models/abstract/abstract_model.py", line 982, in _predict_proba_internal
    y_pred_proba = self._predict_proba(X=X, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/autogluon/tabular/models/fastainn/tabular_nn_fastai.py", line 491, in _predict_proba
    return self.y_scaler.inverse_transform(preds.numpy()).reshape(-1)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/sklearn/pipeline.py", line 948, in inverse_transform
    Xt = transform.inverse_transform(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/sklearn/preprocessing/_data.py", line 1085, in inverse_transform
    X = check_array(
        ^^^^^^^^^^^^
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/sklearn/utils/validation.py", line 1003, in check_array
    _assert_all_finite(
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/sklearn/utils/validation.py", line 126, in _assert_all_finite
    _assert_all_finite_element_wise(
  File "/root/.pyenv/versions/data-processor/lib/python3.11/site-packages/sklearn/utils/validation.py", line 175, in _assert_all_finite_element_wise
    raise ValueError(msg_err)
ValueError: Input contains infinity or a value too large for dtype('float32').

Installed Versions

```python INSTALLED VERSIONS ------------------ date : 2024-07-29 time : 17:37:36.254279 python : 3.11.9.final.0 OS : Linux OS-release : 5.15.0-100-generic Version : #110-Ubuntu SMP Wed Feb 7 13:27:48 UTC 2024 machine : x86_64 processor : x86_64 num_cores : 32 cpu_ram_mb : 128492.08984375 cuda version : None num_gpus : 0 gpu_ram_mb : [] avail_disk_size_mb : 1567129 accelerate : 0.21.0 autogluon : 1.1.1 autogluon.common : 1.1.1 autogluon.core : 1.1.1 autogluon.features : 1.1.1 autogluon.multimodal : 1.1.1 autogluon.tabular : 1.1.1 autogluon.timeseries : 1.1.1 boto3 : 1.34.145 catboost : 1.2.5 defusedxml : 0.7.1 evaluate : 0.4.2 fastai : 2.7.15 gluonts : 0.15.1 hyperopt : 0.2.7 imodels : None jinja2 : 3.1.3 joblib : 1.4.0 jsonschema : 4.21.1 lightgbm : 4.3.0 lightning : 2.3.3 matplotlib : 3.8.4 mlforecast : 0.10.0 networkx : 3.3 nlpaug : 1.1.11 nltk : 3.8.1 nptyping : 2.4.1 numpy : 1.26.4 nvidia-ml-py3 : 7.352.0 omegaconf : 2.2.3 onnxruntime-gpu : None openmim : 0.3.9 optimum : 1.18.1 optimum-intel : None orjson : 3.10.6 pandas : 2.2.2 pdf2image : 1.17.0 Pillow : 10.3.0 psutil : 5.9.8 pytesseract : 0.3.10 pytorch-lightning : 2.3.3 pytorch-metric-learning: 2.3.0 ray : 2.10.0 requests : 2.32.3 scikit-image : 0.20.0 scikit-learn : 1.4.0 scikit-learn-intelex : None scipy : 1.12.0 seqeval : 1.2.2 setuptools : 69.5.1 skl2onnx : None statsforecast : 1.4.0 tabpfn : None tensorboard : 2.16.2 text-unidecode : 1.3 timm : 0.9.16 torch : 2.3.1 torchmetrics : 1.2.1 torchvision : 0.18.1 tqdm : 4.66.4 transformers : 4.39.3 utilsforecast : 0.0.10 vowpalwabbit : None xgboost : 2.0.3 ```
Innixma commented 2 months ago

Hi @TreeOfLearning, thanks for creating an issue! While I understand you can't provide the data, please try to provide a synthetic data example that reproduces the issue, otherwise it would be much harder for us to verify if this issue still exists and how to fix it.

Innixma commented 2 months ago

Please also provide the training logs of the models for reference.