abhishekkrthakur / autoxgb

XGBoost + Optuna
Apache License 2.0
668 stars 88 forks source link

module 'pyarrow.lib' has no attribute 'MonthDayNanoIntervalArray' #3

Closed Ankitkalauni closed 2 years ago

Ankitkalauni commented 2 years ago

Getting error while using TPS November data on Kaggle conda env (my GPU is on)

https://www.kaggle.com/yogeshkalauni/tps-nov-21-auto-xgboost-error

Getting error while using pip install in Kaggle kernel.

Collecting autoxgb
  Downloading autoxgb-0.2.1-py3-none-any.whl (20 kB)
Collecting scikit-learn==1.0.1
  Downloading scikit_learn-1.0.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (23.2 MB)
     |████████████████████████████████| 23.2 MB 1.3 MB/s eta 0:00:01
Requirement already satisfied: optuna==2.10.0 in /opt/conda/lib/python3.7/site-packages (from autoxgb) (2.10.0)
Collecting pyarrow==6.0.0
  Downloading pyarrow-6.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25.5 MB)
     |████████████████████████████████| 25.5 MB 43.9 MB/s eta 0:00:01
Requirement already satisfied: pydantic==1.8.2 in /opt/conda/lib/python3.7/site-packages (from autoxgb) (1.8.2)
Collecting loguru==0.5.3
  Downloading loguru-0.5.3-py3-none-any.whl (57 kB)
     |████████████████████████████████| 57 kB 4.9 MB/s  eta 0:00:01
Collecting xgboost==1.5.0
  Downloading xgboost-1.5.0-py3-none-manylinux2014_x86_64.whl (173.5 MB)
     |████████████████████████████████| 173.5 MB 66 kB/s s eta 0:00:01    |██████████████████              | 97.9 MB 59.6 MB/s eta 0:00:02
Collecting pandas==1.3.4
  Downloading pandas-1.3.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
     |████████████████████████████████| 11.3 MB 46.0 MB/s eta 0:00:01
Requirement already satisfied: fastapi==0.70.0 in /opt/conda/lib/python3.7/site-packages (from autoxgb) (0.70.0)
Requirement already satisfied: uvicorn==0.15.0 in /opt/conda/lib/python3.7/site-packages (from autoxgb) (0.15.0)
Collecting numpy==1.21.3
  Downloading numpy-1.21.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
     |████████████████████████████████| 15.7 MB 39.9 MB/s eta 0:00:01
Collecting joblib==1.1.0
  Downloading joblib-1.1.0-py2.py3-none-any.whl (306 kB)
     |████████████████████████████████| 306 kB 39.9 MB/s eta 0:00:01
Requirement already satisfied: starlette==0.16.0 in /opt/conda/lib/python3.7/site-packages (from fastapi==0.70.0->autoxgb) (0.16.0)
Requirement already satisfied: scipy!=1.4.0 in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (1.7.1)
Requirement already satisfied: cliff in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (3.9.0)
Requirement already satisfied: colorlog in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (6.5.0)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (21.0)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (4.62.3)
Requirement already satisfied: alembic in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (1.7.4)
Requirement already satisfied: cmaes>=0.8.2 in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (0.8.2)
Requirement already satisfied: sqlalchemy>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (1.4.25)
Requirement already satisfied: PyYAML in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (5.4.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.7/site-packages (from pandas==1.3.4->autoxgb) (2.8.0)
Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas==1.3.4->autoxgb) (2021.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.7/site-packages (from pydantic==1.8.2->autoxgb) (3.10.0.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn==1.0.1->autoxgb) (2.2.0)
Requirement already satisfied: anyio<4,>=3.0.0 in /opt/conda/lib/python3.7/site-packages (from starlette==0.16.0->fastapi==0.70.0->autoxgb) (3.3.0)
Requirement already satisfied: click>=7.0 in /opt/conda/lib/python3.7/site-packages (from uvicorn==0.15.0->autoxgb) (8.0.1)
Requirement already satisfied: asgiref>=3.4.0 in /opt/conda/lib/python3.7/site-packages (from uvicorn==0.15.0->autoxgb) (3.4.1)
Requirement already satisfied: h11>=0.8 in /opt/conda/lib/python3.7/site-packages (from uvicorn==0.15.0->autoxgb) (0.12.0)
Requirement already satisfied: sniffio>=1.1 in /opt/conda/lib/python3.7/site-packages (from anyio<4,>=3.0.0->starlette==0.16.0->fastapi==0.70.0->autoxgb) (1.2.0)
Requirement already satisfied: idna>=2.8 in /opt/conda/lib/python3.7/site-packages (from anyio<4,>=3.0.0->starlette==0.16.0->fastapi==0.70.0->autoxgb) (2.10)
Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from click>=7.0->uvicorn==0.15.0->autoxgb) (4.8.1)
Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=20.0->optuna==2.10.0->autoxgb) (2.4.7)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas==1.3.4->autoxgb) (1.16.0)
Requirement already satisfied: greenlet!=0.4.17 in /opt/conda/lib/python3.7/site-packages (from sqlalchemy>=1.1.0->optuna==2.10.0->autoxgb) (1.1.1)
Requirement already satisfied: Mako in /opt/conda/lib/python3.7/site-packages (from alembic->optuna==2.10.0->autoxgb) (1.1.5)
Requirement already satisfied: importlib-resources in /opt/conda/lib/python3.7/site-packages (from alembic->optuna==2.10.0->autoxgb) (5.2.2)
Requirement already satisfied: PrettyTable>=0.7.2 in /opt/conda/lib/python3.7/site-packages (from cliff->optuna==2.10.0->autoxgb) (2.2.0)
Requirement already satisfied: cmd2>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from cliff->optuna==2.10.0->autoxgb) (2.2.0)
Requirement already satisfied: autopage>=0.4.0 in /opt/conda/lib/python3.7/site-packages (from cliff->optuna==2.10.0->autoxgb) (0.4.0)
Requirement already satisfied: stevedore>=2.0.1 in /opt/conda/lib/python3.7/site-packages (from cliff->optuna==2.10.0->autoxgb) (3.4.0)
Requirement already satisfied: pbr!=2.1.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from cliff->optuna==2.10.0->autoxgb) (5.6.0)
Requirement already satisfied: colorama>=0.3.7 in /opt/conda/lib/python3.7/site-packages (from cmd2>=1.0.0->cliff->optuna==2.10.0->autoxgb) (0.4.4)
Requirement already satisfied: attrs>=16.3.0 in /opt/conda/lib/python3.7/site-packages (from cmd2>=1.0.0->cliff->optuna==2.10.0->autoxgb) (21.2.0)
Requirement already satisfied: pyperclip>=1.6 in /opt/conda/lib/python3.7/site-packages (from cmd2>=1.0.0->cliff->optuna==2.10.0->autoxgb) (1.8.2)
Requirement already satisfied: wcwidth>=0.1.7 in /opt/conda/lib/python3.7/site-packages (from cmd2>=1.0.0->cliff->optuna==2.10.0->autoxgb) (0.2.5)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->click>=7.0->uvicorn==0.15.0->autoxgb) (3.5.0)
Requirement already satisfied: MarkupSafe>=0.9.2 in /opt/conda/lib/python3.7/site-packages (from Mako->alembic->optuna==2.10.0->autoxgb) (2.0.1)
Installing collected packages: numpy, joblib, xgboost, scikit-learn, pyarrow, pandas, loguru, autoxgb
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.5
    Uninstalling numpy-1.19.5:
      Successfully uninstalled numpy-1.19.5
  Attempting uninstall: joblib
    Found existing installation: joblib 1.0.1
    Uninstalling joblib-1.0.1:
      Successfully uninstalled joblib-1.0.1
  Attempting uninstall: xgboost
    Found existing installation: xgboost 1.4.2
    Uninstalling xgboost-1.4.2:
      Successfully uninstalled xgboost-1.4.2
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 0.23.2
    Uninstalling scikit-learn-0.23.2:
      Successfully uninstalled scikit-learn-0.23.2
  Attempting uninstall: pyarrow
    Found existing installation: pyarrow 5.0.0
    Uninstalling pyarrow-5.0.0:
      Successfully uninstalled pyarrow-5.0.0
  Attempting uninstall: pandas
    Found existing installation: pandas 1.3.3
    Uninstalling pandas-1.3.3:
      Successfully uninstalled pandas-1.3.3
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-io 0.18.0 requires tensorflow-io-gcs-filesystem==0.18.0, which is not installed.
explainable-ai-sdk 1.3.2 requires xai-image-widget, which is not installed.
dask-cudf 21.8.3 requires cupy-cuda114, which is not installed.
cudf 21.8.3 requires cupy-cuda110, which is not installed.
beatrix-jupyterlab 3.1.1 requires google-cloud-bigquery-storage, which is not installed.
yellowbrick 1.3.post1 requires numpy<1.20,>=1.16.0, but you have numpy 1.21.3 which is incompatible.
tfx-bsl 1.3.0 requires absl-py<0.13,>=0.9, but you have absl-py 0.14.0 which is incompatible.
tfx-bsl 1.3.0 requires numpy<1.20,>=1.16, but you have numpy 1.21.3 which is incompatible.
tfx-bsl 1.3.0 requires pyarrow<3,>=1, but you have pyarrow 6.0.0 which is incompatible.
tensorflow 2.6.0 requires numpy~=1.19.2, but you have numpy 1.21.3 which is incompatible.
tensorflow 2.6.0 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.
tensorflow 2.6.0 requires typing-extensions~=3.7.4, but you have typing-extensions 3.10.0.2 which is incompatible.
tensorflow-transform 1.3.0 requires absl-py<0.13,>=0.9, but you have absl-py 0.14.0 which is incompatible.
tensorflow-transform 1.3.0 requires numpy<1.20,>=1.16, but you have numpy 1.21.3 which is incompatible.
tensorflow-transform 1.3.0 requires pyarrow<3,>=1, but you have pyarrow 6.0.0 which is incompatible.
tensorflow-io 0.18.0 requires tensorflow<2.6.0,>=2.5.0, but you have tensorflow 2.6.0 which is incompatible.
pdpbox 0.2.1 requires matplotlib==3.1.1, but you have matplotlib 3.4.3 which is incompatible.
numba 0.54.0 requires numpy<1.21,>=1.17, but you have numpy 1.21.3 which is incompatible.
matrixprofile 1.1.10 requires protobuf==3.11.2, but you have protobuf 3.18.1 which is incompatible.
hypertools 0.7.0 requires scikit-learn!=0.22,<0.24,>=0.19.1, but you have scikit-learn 1.0.1 which is incompatible.
dask-cudf 21.8.3 requires dask<=2021.07.1,>=2021.6.0, but you have dask 2021.9.1 which is incompatible.
dask-cudf 21.8.3 requires pandas<1.3.0dev0,>=1.0, but you have pandas 1.3.4 which is incompatible.
cudf 21.8.3 requires pandas<1.3.0dev0,>=1.0, but you have pandas 1.3.4 which is incompatible.
apache-beam 2.32.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.4 which is incompatible.
apache-beam 2.32.0 requires numpy<1.21.0,>=1.14.3, but you have numpy 1.21.3 which is incompatible.
apache-beam 2.32.0 requires pyarrow<5.0.0,>=0.15.1, but you have pyarrow 6.0.0 which is incompatible.
apache-beam 2.32.0 requires typing-extensions<3.8.0,>=3.7.0, but you have typing-extensions 3.10.0.2 which is incompatible.
Successfully installed autoxgb-0.2.1 joblib-1.1.0 loguru-0.5.3 numpy-1.21.3 pandas-1.3.4 pyarrow-6.0.0 scikit-learn-1.0.1 xgboost-1.5.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
from autoxgb import AutoXGB

# required parameters:
train_filename = "../input/tabular-playground-series-nov-2021/train.csv"
output = "outputt"

# optional parameters
test_filename = '../input/tabular-playground-series-nov-2021/test.csv'
task = 'classification'
idx = None
targets = ["target"]
features = None
categorical_features = None
use_gpu = True
num_folds = 5
seed = 42
num_trials = 100
time_limit = 7*60*60
fast = False

# Now its time to train the model!
axgb = AutoXGB(
    train_filename=train_filename,
    output=output,
    test_filename=test_filename,
    task=task,
    idx=idx,
    targets=targets,
    features=features,
    categorical_features=categorical_features,
    use_gpu=use_gpu,
    num_folds=num_folds,
    seed=seed,
    num_trials=num_trials,
    time_limit=time_limit,
    fast=fast,
)
axgb.train()
2021-11-01 07:03:06.106 | INFO     | autoxgb.autoxgb:__post_init__:42 - Output directory: outputt
2021-11-01 07:03:06.108 | WARNING  | autoxgb.autoxgb:__post_init__:49 - No id column specified. Will default to `id`.
2021-11-01 07:03:06.110 | INFO     | autoxgb.autoxgb:_process_data:149 - Reading training data
2021-11-01 07:03:22.502 | INFO     | autoxgb.utils:reduce_memory_usage:50 - Mem. usage decreased to 117.30 Mb (74.9% reduction)
2021-11-01 07:03:22.583 | INFO     | autoxgb.autoxgb:_determine_problem_type:140 - Problem type: binary_classification
2021-11-01 07:03:38.131 | INFO     | autoxgb.utils:reduce_memory_usage:50 - Mem. usage decreased to 105.06 Mb (74.8% reduction)
2021-11-01 07:03:38.132 | INFO     | autoxgb.autoxgb:_create_folds:58 - Creating folds
2021-11-01 07:03:38.248 | INFO     | autoxgb.autoxgb:_process_data:170 - Encoding target(s)
2021-11-01 07:03:38.282 | INFO     | autoxgb.autoxgb:_process_data:195 - Found 0 categorical features.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_38/3565386527.py in <module>
     37     fast=fast,
     38 )
---> 39 axgb.train()

/opt/conda/lib/python3.7/site-packages/autoxgb/autoxgb.py in train(self)
    244 
    245     def train(self):
--> 246         self._process_data()
    247         best_params = train_model(self.model_config)
    248         logger.info("Training complete")

/opt/conda/lib/python3.7/site-packages/autoxgb/autoxgb.py in _process_data(self)
    210                     test_fold[categorical_features] = ord_encoder.transform(test_fold[categorical_features].values)
    211                 categorical_encoders[fold] = ord_encoder
--> 212             fold_train.to_feather(os.path.join(self.output, f"train_fold_{fold}.feather"))
    213             fold_valid.to_feather(os.path.join(self.output, f"valid_fold_{fold}.feather"))
    214             if self.test_filename is not None:

/opt/conda/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    205                 else:
    206                     kwargs[new_arg_name] = new_arg_value
--> 207             return func(*args, **kwargs)
    208 
    209         return cast(F, wrapper)

/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in to_feather(self, path, **kwargs)
   2517         from pandas.io.feather_format import to_feather
   2518 
-> 2519         to_feather(self, path, **kwargs)
   2520 
   2521     @doc(

/opt/conda/lib/python3.7/site-packages/pandas/io/feather_format.py in to_feather(df, path, storage_options, **kwargs)
     44     """
     45     import_optional_dependency("pyarrow")
---> 46     from pyarrow import feather
     47 
     48     if not isinstance(df, DataFrame):

/opt/conda/lib/python3.7/site-packages/pyarrow/feather.py in <module>
     23                          concat_tables, schema)
     24 import pyarrow.lib as ext
---> 25 from pyarrow import _feather
     26 from pyarrow._feather import FeatherError  # noqa: F401
     27 from pyarrow.vendored.version import Version

/opt/conda/lib/python3.7/site-packages/pyarrow/_feather.pyx in init pyarrow._feather()

AttributeError: module 'pyarrow.lib' has no attribute 'MonthDayNanoIntervalArray'
abhishekkrthakur commented 2 years ago

could you please share the Notebook link?

Ankitkalauni commented 2 years ago

could you please share the Notebook link?

https://www.kaggle.com/yogeshkalauni/tps-nov-21-auto-xgboost-error

abhishekkrthakur commented 2 years ago

If you run this locally, it will work fine. There seems to be some kind of conflict between pyarrow versions. Please try to install autoxgb without dependencies: pip install --no-deps autoxgb Lemme know if that solves your issue :)

Tharunkumar01 commented 2 years ago

It works fine without installing dependencies

If you run this locally, it will work fine. There seems to be some kind of conflict between pyarrow versions. Please try to install autoxgb without dependencies: pip install --no-deps autoxgb Lemme know if that solves your issue :)