Nixtla / mlforecast

Scalable machine 🤖 learning for time series forecasting.
https://nixtlaverse.nixtla.io/mlforecast
Apache License 2.0
789 stars 74 forks source link

LightGBM + GroupedArray._tail() + num_threads>1 #350

Closed michal-mmm closed 3 weeks ago

michal-mmm commented 3 weeks ago

What happened + What you expected to happen

An error occurs in the _tail method of the GroupedArray class in core_forecast/grouped_array.py with certain configurations. Specifically, this error is encountered when:

The error I think occurs in the following method:

def _tail(self, k: int) -> np.ndarray:
    ...
    _LIB[f"{self.prefix}_Tail"](
        self._handle,
        ctypes.c_int(k),
        _data_as_void_ptr(out),
    )

Versions / Dependencies

requirements.txt (only installed lightgbm and mlforecast):

alembic==1.13.1
cloudpickle==3.0.0
colorlog==6.8.2
coreforecast==0.0.9
fsspec==2024.6.0
joblib==1.4.2
lightgbm==4.3.0
llvmlite==0.42.0
Mako==1.3.5
MarkupSafe==2.1.5
mlforecast==0.13.0
numba==0.59.1
numpy==1.26.4
optuna==3.6.1
packaging==24.1
pandas==2.2.2
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
scikit-learn==1.5.0
scipy==1.13.1
six==1.16.0
SQLAlchemy==2.0.30
threadpoolctl==3.5.0
tqdm==4.66.4
typing_extensions==4.12.2
tzdata==2024.1
utilsforecast==0.1.10
window_ops==0.0.15

System: Python 3.10.13 OS: macOS 14.5 23F79 arm64 (Sonoma) Shell: zsh 5.9 pip 23.0.1

Reproduction script

import pandas as pd import lightgbm as lgb # must import this from mlforecast import MLForecast from mlforecast.target_transforms import Differences from sklearn.linear_model import LinearRegression

df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/air-passengers.csv', parse_dates=['ds'])

models = {"lin": LinearRegression()}

fcst = MLForecast( models=models, freq='MS', lags=[12], target_transforms=[Differences([1])], # at least one target_transforms num_threads=2, # num_threads must be >1 )

fcst.preprocess(df) print("Success!")

Issue Severity

Low: It annoys or frustrates me.

jmoralez commented 3 weeks ago

Hey @michal-mmm, thanks for using mlforecast. This is most likely an issue with OpenMP, can you please provide the following information?

michal-mmm commented 3 weeks ago

hey @jmoralez, can confirm:

jmoralez commented 3 weeks ago

Thanks for reporting back! conda ships its own OpenMP and the coreforecast wheel from pip has a different one, which leads to the program loading two versions and producing a segfault. Glad you were able to solve it and sorry for the troubles.