autogluon / autogluon

Fast and Accurate ML in 3 Lines of Code
https://auto.gluon.ai/
Apache License 2.0
7.79k stars 910 forks source link

[BUG] Pandas groupby error occurred while fitting the training data. #2843

Closed SebastienZh closed 1 year ago

SebastienZh commented 1 year ago

Describe the bug I'm using autogluon 0.6.2 to predict time series data. While fitting the training data, the error "TypeError: n needs to be an int or a list/set/tuple of ints" occurred. Part of the exception went as follows:

File D:\ProgramFiles\Anaconda3\envs\ag_39\lib\site-packages\autogluon\timeseries\dataset\ts_dataframe.py:572, in TimeSeriesDataFrame.slice_by_timestep(self, start_index, end_index) 569 end_index = time_step_slice.stop 571 time_step_slice = slice(start_index, end_index) --> 572 result = self.groupby(level=ITEMID, sort=False, as_index=False).nth(time_step_slice) 573 result.static_features = self.static_features 574 result._cached_freq = self._cached_freq

File ~\AppData\Roaming\Python\Python39\site-packages\pandas\core\groupby\groupby.py:2304, in GroupBy.nth(self, n, dropna) 2302 valid_containers = (set, list, tuple) 2303 if not isinstance(n, (valid_containers, int)): -> 2304 raise TypeError("n needs to be an int or a list/set/tuple of ints") 2306 if not dropna: 2308 if isinstance(n, int):

TypeError: n needs to be an int or a list/set/tuple of ints

It seems that something went wrong calling Pandas' groupby method.

To Reproduce The dataset was constructed as shown in the documentation. 4 known covariates were added to the dataset. ts_dataframe = TimeSeriesDataFrame.from_data_frame( build_labels(train_data_clean[4],100,"sensor_5"), id_column="item_id", timestamp_column="timestamp") ts_dataframe["value_sr1"] = train_data_clean[0].values ts_dataframe["value_sr2"] = train_data_clean[1].values ts_dataframe["value_sr3"] = train_data_clean[2].values ts_dataframe["value_sr4"] = train_data_clean[3].values The predictor was constructed and fitted as follows: predictor = TimeSeriesPredictor( prediction_length=10000, target="value", known_covariates_names=["value_sr1","value_sr2","value_sr3","value_sr4"], eval_metric="MSE", path = "problem_1" ) predictor.fit(train_data,presets="best_quality") The training data has the length of 990001, and I hope to use it to predict 10000 length of data. image

Installed Versions

date : 2023-02-06 time : 17:53:29.839255 python : 3.9.13.final.0 OS : Windows OS-release : 10 Version : 10.0.19044 machine : AMD64 processor : Intel64 Family 6 Model 141 Stepping 1, GenuineIntel num_cores : 16 cpu_ram_mb : 16021 cuda version : None num_gpus : 0 gpu_ram_mb : [6009] avail_disk_size_mb : None

accelerate : 0.13.2 albumentations : 1.1.0 autogluon.common : 0.6.2 autogluon.core : 0.6.2 autogluon.features : 0.6.2 autogluon.multimodal : 0.6.2 autogluon.tabular : 0.6.2 autogluon.text : 0.6.2 autogluon.timeseries : 0.6.2 autogluon.vision : 0.6.2 boto3 : 1.24.62 catboost : 1.0.6 dask : 2021.11.2 defusedxml : 0.7.1 distributed : 2021.11.2 evaluate : 0.3.0 fairscale : 0.4.6 fastai : 2.7.9 gluoncv : 0.11.0 gluonts : 0.11.6 hyperopt : 0.2.7 joblib : 1.1.0 jsonschema : 4.4.0 lightgbm : 3.3.2 matplotlib : 3.5.2 networkx : 2.8.6 nlpaug : 1.1.10 nltk : 3.7 nptyping : 1.4.4 numpy : 1.22.4 omegaconf : 2.1.2 openmim : None pandas : 1.3.5 PIL : 9.3.0 psutil : 5.9.4 pytorch-metric-learning: None pytorch_lightning : 1.7.7 ray : 2.0.1 requests : 2.28.1 scipy : 1.7.3 sentencepiece : None seqeval : None setuptools : 65.3.0 skimage : 0.19.3 sklearn : 1.0.2 smart_open : 5.2.1 statsmodels : 0.13.5 text-unidecode : None timm : 0.5.4 torch : 1.12.0+cpu torchmetrics : 0.8.2 torchtext : 0.13.0 torchvision : 0.13.0+cpu tqdm : 4.64.0 transformers : 4.23.1 xgboost : 1.7.2

Innixma commented 1 year ago

This can be fixed by upgrading pandas version to >=1.4.1, the lower bound version range of pandas in v0.6.2 does not work with timeseries. This has been fixed in upcoming v0.7 release.

You can fix right now via:

pip install -U pandas