AxeldeRomblay / MLBox

MLBox is a powerful Automated Machine Learning python library.
https://mlbox.readthedocs.io/en/latest/
Other
1.49k stars 274 forks source link

Trying to install and getting xgboost errors #63

Closed TimusLetap closed 5 years ago

TimusLetap commented 5 years ago

Systems is Kaggle kernel which is Ubuntu based which seems to be the desired environment

I rung this:

!apt-get install build-essential
!pip install cmake
!pip install xgboost>=0.6a2
!pip install lightgbm>=2.0.2
!pip install mlbox

Resulting in this:

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-xib6_1h7/xgboost/

Can you please help me out? I see your examples are also Kaggle based, but they don't have the install steps. Do you somehow install packages from setup within the kernel???

TimusLetap commented 5 years ago

when re-running multiple times the error changes to variants of xgboost errors, not sure what that means. I've been trying to replicate your titanic Kaggle build to see how to use and make it work, then I can confidently apply this to other datasets.

AxeldeRomblay commented 5 years ago

I'll work on the setup (to replace xgboost by Lightgbm). Meanwhile you can try to install xgboost==0.6a2 manually...

AxeldeRomblay commented 5 years ago

see https://github.com/AxeldeRomblay/MLBox/issues/55

TimusLetap commented 5 years ago

Same error(s) come up. I think you may be right that xgboost is causing the problem. Replacing might help. Lightgbm doesn't seem to be causing issues. Then again I haven't been able to run the actual package so I don't know what the limitations are.

TimusLetap commented 5 years ago

Can you perhaps share a screen shot of how you installed on Kaggle Kernel to get your examples working?

AxeldeRomblay commented 5 years ago

Did you manage to install xgboost first ? If not please refer to : https://xgboost.readthedocs.io/en/latest/build.html

TimusLetap commented 5 years ago

Are you able to share screen shots of you implementing and importing these libraries in your Kaggle examples? That would be super useful and would most likely address such issues. I was trying to follow your Kaggle examples but without being able to actually run the package it has proven quite difficult. I am able to install xgboost but the xgboost errors arise from the mlbox installation. xgboost on its own runs fine. When installing mlbox is when the xgboost specific errors arise.

TimusLetap commented 5 years ago

I hope I am being clear and concise. If you can re-create your use of mlbox on your kaggle examples and share screenshots of how you were able to load and runt the package in that environment it would be highly appreciated and useful for recreating the process. Thank you.

richinvest commented 5 years ago

I have the same issue like Timus. No issues with x boost, just when installing mlbox.

AxeldeRomblay commented 5 years ago

Ok I have just removed XGBoost from the setup file (and the imports in the code)... Can you try to reinstall MLBox from pip or from the github please ? It should work now... see https://github.com/AxeldeRomblay/MLBox/tree/master/python-package Thanks ! PS : I will share screenshots next week, I just need more time...

TimusLetap commented 5 years ago

No worries! I understand. I just want to be able to use and spread the news of the package for others! Take your time. It's better to have this working than not. I'll post as soon as I run some tests.

TimusLetap commented 5 years ago

I ran MLbox seems to run and call ok, just having issues loading dataset, are there parameters we can use for read function? Or is the tqdm wrapper compatible?

TimusLetap commented 5 years ago

Is there a way to pass dataframes (pandas if need to be specified) for MLBox to process??

AxeldeRomblay commented 5 years ago

ok great :) can you open a new issue please with screenshots/snippets of the code/error message ? For the moment you have to dump your dataframes and read+preprocess using MLBox... but for the next release it will be two separated tasks that so that you can skip the reading task if you have data already loaded...

TimusLetap commented 5 years ago

What size csv files does it handle at the moment? I am having trouble reading in large dataset

AxeldeRomblay commented 5 years ago

how many rows and features do you have ?

TimusLetap commented 5 years ago

I have 2 Columns with 64M rows, One column is designated as the feature and the other is the target variable.

TimusLetap commented 5 years ago

Axel,

I don't understand how you closed the thread without a resolution being reached, unless you pushed an update through?? Is there a way to parse the data? tqdm is integrated, no?

On Fri, May 10, 2019 at 11:35 AM Axel notifications@github.com wrote:

Closed #63 https://github.com/AxeldeRomblay/MLBox/issues/63.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/AxeldeRomblay/MLBox/issues/63#event-2333909652, or mute the thread https://github.com/notifications/unsubscribe-auth/AG3WQ4XXJ5MGCUMAINZNJP3PUW57NANCNFSM4HGYOJSA .

AxeldeRomblay commented 5 years ago

Hello ! The install problem is solved (xgboost is now removed in the latest release...). For the reading issue, I think your dataset is just too large ? maybe you can open a new issue with a screenshot (is it the reading or the preprocessing that takes a lot of time ?) Thanks !