Closed GinoWoz1 closed 6 years ago
Hmm, I tested those codes under a fresh test conda environment and the error was not reproduced. But I used a easy way to install fancyimpute
as the commands below. Could you please build a conda environment for a test?
conda create -n test_env python=3.6
activate test_env
pip install missingno
conda install -y -c anaconda ecos
conda install -y -c conda-forge lapack
conda install -y -c cvxgrp cvxpy
conda install -y -c cimcb fancyimpute
pip install rfpimp
conda install -y py-xgboost
pip install tpot msgpack dask[delayed] dask-ml
Another suggestion about the customized scorer in your codes. May it will be more stable if the function does not raise ValueError
as the example below:
def rmsle_loss(y_true, y_pred):
assert len(y_true) == len(y_pred)
try:
terms_to_sum = [(math.log(y_pred[i] + 1) - math.log(y_true[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
except:
return float('inf')
if not (y_true >= 0).all() and not (y_pred >= 0).all():
return float('inf')
return (sum(terms_to_sum) * (1.0/len(y_true))) ** 0.5
Thanks Weixuan. Quick question, how do I run the python script out of the conda environment? I am just used to opening up the script on my desktop and running it there.
Nevermind on the python script question. I was able to setup on my laptop.
Any idea why this install process breaks the verbosity argument? everything else seems to be working fine, thanks a ton for your help.
Sincerely, Justin
You're welcome. Do you mean no confirmation during installation of packages via conda? If so, the -y
in the command is for this purpose.
The progress bar doesnt show up.
Hmm, I think progress bar should be not easy to catch with tons of warning messages when dask=True
but it did show up in my test (as stdout below).
We need refine this warning message action when dask=True
.
**self._backend_args)
D:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py:547: UserWarning: Multiprocessing-backed parallel loops cannot be nested below threads, setting n_jobs=1
**self._backend_args)
D:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py:547: UserWarning: Multiprocessing-backed parallel loops cannot be nested below threads, setting n_jobs=1
**self._backend_args)
Generation 1 - Current best internal CV score: -5.969518794583038e-15
Optimization Progress: 4%|█▉ | 101/2550 [01:08<51:22, 1.26s/pipeline]
Thanks, no problem. I can live without it for now just as long as the periodic checkpoints are being saved. You can close this. thanks again!
Hmm after the first generation, the same error came up in the virtual environment. Were you able to finish one generation and save a pipeline? I did exactly as you suggested with the virtual env.
Hmm, did you also update rmsle_loss
in your codes? Can you please provide a random_state
to reproduce the issue?
def rmsle_loss(y_true, y_pred):
assert len(y_true) == len(y_pred)
try:
terms_to_sum = [(math.log(y_pred[i] + 1) - math.log(y_true[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
except:
return float('inf')
if not (y_true >= 0).all() and not (y_pred >= 0).all():
return float('inf')
return (sum(terms_to_sum) * (1.0/len(y_true))) ** 0.5
Thanks I did. Sorry for the bother, it looks like a user error on my side with my virtual environment. Really hate to inconvenience you. I am going to do a overview of TPOT soon for some individuals in my area at a meetup so this will help greatly! I'll make sure to give a shout out to your and your team.
Sincerely, Justin
On Fri, Sep 14, 2018 at 8:11 AM Weixuan Fu notifications@github.com wrote:
Hmm, did you also update rmsle_loss in your codes? Can you provide a random_state to reproduce the issue?
def rmsle_loss(y_true, y_pred): assert len(y_true) == len(y_pred) try: terms_to_sum = [(math.log(y_pred[i] + 1) - math.log(y_true[i] + 1)) * 2.0 for i,pred in enumerate(y_pred)] except: return float('inf') if not (y_true >= 0).all() and not (y_pred >= 0).all(): return float('inf') return (sum(terms_to_sum) (1.0/len(y_true))) ** 0.5
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EpistasisLab/tpot/issues/764#issuecomment-421389884, or mute the thread https://github.com/notifications/unsubscribe-auth/AQuRcbdaaedW_zFed3k6VD2AwKi7dBigks5ua8cTgaJpZM4WmMOo .
Hey Weixuan,
With the same exact setup, I am now getting the error below. Any idea? I am unable to get TPOT to finish a single run.
conda create -n test_env python=3.6 activate test_env pip install missingno conda install -y -c anaconda ecos conda install -y -c conda-forge lapack conda install -y -c cvxgrp cvxpy conda install -y -c cimcb fancyimpute pip install rfpimp conda install -y py-xgboost pip install tpot msgpack dask[delayed] dask-ml
Hmm it seems a xgboost API issue. I tried to reproduce this issue via the demo below but the error didn't show up. I think I recently updated xgboost to 0.80 via conda install -c anaconda py-xgboost
, maybe updating xgboost will help.
from sklearn.metrics import make_scorer
from tpot import TPOTRegressor
import warnings
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import math
warnings.filterwarnings('ignore')
housing = load_boston()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target,
train_size=0.75, test_size=0.25)
def rmsle_loss(y_true, y_pred):
assert len(y_true) == len(y_pred)
try:
terms_to_sum = [(math.log(y_pred[i] + 1) - math.log(y_true[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
except:
return float('inf')
if not (y_true >= 0).all() and not (y_pred >= 0).all():
return float('inf')
return (sum(terms_to_sum) * (1.0/len(y_true))) ** 0.5
tpot = TPOTRegressor(verbosity=3, scoring = rmsle_loss, generations = 50,population_size=50,offspring_size= 50,max_eval_time_mins=10,warm_start=True, use_dask=True)
tpot.fit(X_train,y_train)
I got the same issue. I can't use conda environment. Whenever i use "use_dask=True", i get the following error :
RuntimeError: A pipeline has not yet been optimized. Please call fit() first.
tpot = TPOTRegressor(verbosity=3, scoring = rmsle_loss, generations = 50,population_size=50,offspring_size= 50,max_eval_time_mins=10,warm_start=True, use_dask=True) tpot.fit(X_train,y_train)
RuntimeError: A pipeline has not yet been optimized. Please call fit() first.
I have tried on azure databricks cluster as well as on my local machine
@GuillaumeLab which version of dask
are installed in your environment?
dask 2.24.0 .
Thanks for your answer. I also get another error message : Restarting distributed.nanny - WARNING - Worker exceeded 95% memory budget.
I checked this thread : "https://github.com/dask/distributed/issues/2297"", and it does not really help solve the issue. Tpot is working fine on a single device, no memory issue. Why distributing it on several devices would cause a memory issue?
[provide general introduction to the issue and why it is relevant to this repository]
I cannot use multiple cores and therefore my jobs are running extremely slow.
Context of the issue
In 0.9.4 a fix was put in to use use_dask = True or to import manually. Both methods return the error
" File "C:\Users\jstnjc\Anaconda3\lib\site-packages\tpot\base.py", line 684, in fit self._update_top_pipeline()
File "C:\Users\jstnjc\Anaconda3\lib\site-packages\tpot\base.py", line 758, in _update_top_pipeline raise RuntimeError('A pipeline has not yet been optimized. Please call fit() first.')
RuntimeError: A pipeline has not yet been optimized. Please call fit() first."
Process to reproduce the issue
(ive tested this on 3 different computers including a cloud service)
Install Anaconda 3.6 for windows 64bit pip Install missingno pip install these .whl files manually (need to do so for fancyimpute) -ecos-2.0.5-cp36-cp36m-win_amd64.whl -cvxpy-1.0.8-cp36-cp36m-win_amd64.whl pip install fancimpute pip install rfpimp (used for my custom functions import file) conda install py-xgboost pip install tpot pip install msgpack pip install dask[delayed] dask-ml
[ordered list the process to finding and recreating the issue, example below]
With the above - execute the code below:
Expected result
Expect the process to run and to use all cores.
Current result
[describe what you currently experience from this process, and thereby explain the bug]