Open lpatruno opened 6 years ago
Hmm, it is weird. Could you please share the dataset or codes of making a simulation dataset to let us reproduce this issue?
Unfortunately, I cannot share the dataset as it is proprietary. However, here are some summary stats from one of the training sets:
count mean std min 25% 50% 75% max
col_0 410.0 0.314634 0.464937 0.000000 0.000000 0.000000 1.000000 1.000000
col_1 410.0 85.010912 52.331006 26.908183 49.232402 68.898860 100.652436 456.152245
col_2 410.0 68.839790 49.453209 8.106944 38.153472 52.422222 77.461285 422.319444
col_3 410.0 123.073782 73.341779 34.219294 72.132352 102.015683 156.204155 521.180868
col_4 410.0 16.171122 20.801288 0.013542 4.021528 10.857558 21.449815 229.918414
col_5 410.0 54.233992 55.214856 2.070602 20.907862 35.166267 65.538912 367.913113
col_6 410.0 38.062870 51.508699 0.011088 7.069893 20.134896 45.943032 353.163113
col_7 178.0 3.629213 2.958783 1.000000 1.250000 3.000000 5.000000 17.000000
col_8 410.0 1.621951 1.363516 0.000000 1.000000 1.000000 2.000000 9.000000
col_9 410.0 0.097561 0.297083 0.000000 0.000000 0.000000 0.000000 1.000000
col_10 410.0 0.546341 0.498456 0.000000 0.000000 1.000000 1.000000 1.000000
col_11 410.0 0.348780 0.477167 0.000000 0.000000 0.000000 1.000000 1.000000
col_12 410.0 0.014634 0.120230 0.000000 0.000000 0.000000 0.000000 1.000000
col_13 410.0 0.034146 0.181827 0.000000 0.000000 0.000000 0.000000 1.000000
col_14 410.0 0.004878 0.069758 0.000000 0.000000 0.000000 0.000000 1.000000
col_15 410.0 0.004878 0.069758 0.000000 0.000000 0.000000 0.000000 1.000000
col_16 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_17 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_18 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_19 410.0 0.004878 0.069758 0.000000 0.000000 0.000000 0.000000 1.000000
col_20 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_21 410.0 0.036585 0.187971 0.000000 0.000000 0.000000 0.000000 1.000000
col_22 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_23 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_24 410.0 0.012195 0.109890 0.000000 0.000000 0.000000 0.000000 1.000000
col_25 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_26 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_27 410.0 0.004878 0.069758 0.000000 0.000000 0.000000 0.000000 1.000000
col_28 410.0 0.039024 0.193890 0.000000 0.000000 0.000000 0.000000 1.000000
col_29 410.0 0.029268 0.168764 0.000000 0.000000 0.000000 0.000000 1.000000
col_30 410.0 0.009756 0.098410 0.000000 0.000000 0.000000 0.000000 1.000000
col_31 410.0 0.009756 0.098410 0.000000 0.000000 0.000000 0.000000 1.000000
col_32 410.0 0.004878 0.069758 0.000000 0.000000 0.000000 0.000000 1.000000
col_33 410.0 0.248780 0.432834 0.000000 0.000000 0.000000 0.000000 1.000000
col_34 410.0 0.109756 0.312967 0.000000 0.000000 0.000000 0.000000 1.000000
col_35 410.0 0.009756 0.098410 0.000000 0.000000 0.000000 0.000000 1.000000
col_36 410.0 0.007317 0.085330 0.000000 0.000000 0.000000 0.000000 1.000000
col_37 410.0 0.017073 0.129702 0.000000 0.000000 0.000000 0.000000 1.000000
col_38 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_39 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_40 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_41 410.0 0.024390 0.154446 0.000000 0.000000 0.000000 0.000000 1.000000
col_42 410.0 0.097561 0.297083 0.000000 0.000000 0.000000 0.000000 1.000000
col_43 410.0 0.004878 0.069758 0.000000 0.000000 0.000000 0.000000 1.000000
col_44 410.0 0.002439 0.049386 0.000000 0.000000 0.000000 0.000000 1.000000
col_45 410.0 0.397561 0.489992 0.000000 0.000000 0.000000 1.000000 1.000000
col_46 410.0 0.014634 0.120230 0.000000 0.000000 0.000000 0.000000 1.000000
col_47 410.0 0.397561 0.489992 0.000000 0.000000 0.000000 1.000000 1.000000
I have a similar issue, I'm working with Anaconda and the spyder IDE, after TPOT runs a few generations I get a message saying that the kernel died; I have tried this with PyCharm and the same happens (altough pycharm returns the following error; Process finished with exit code -1073741819 (0xC0000005) ). The enviroment im working on has;
I have worked with large (> 10'000k datapoints) and small (< 1000k datapoints) datasets, I'm attaching a copy of a part of the dataset; 2018_08_22-XSLX_SnipForGithub.xlsx
Below is my code;
`import numpy as np import pandas as pd import os from tpot import TPOTRegressor
dataset = pd.read_csv(r'2018_08_22-XSLX_SnipForGithub.csv', index_col = 0)
values = dataset values = values.values
n_train_hours = 4000 train = values[:n_train_hours,] test = values[n_train_hours:, ] train_X, train_y = train[:, 1:], train[:, 0] test_X, test_y = test[:, 1:], test[:, 0] print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
tpot = TPOTRegressor(scoring = 'neg_mean_absolute_error', max_time_mins = 100, n_jobs = 1, #Sometimes I run it with -1 and it also crashes. verbosity = 2, cv = 5, warm_start = True, periodic_checkpoint_folder = 'C:/mydir')
tpot.fit(train_X, train_y) tpot.export('2018_08_22-tpot_exported_pipeline.py')`
Note; I ran the same process in another PC and it lasted longer working, but it still crashed after 12h or so, I also have run the program in google colab and Kaggle and it does not seem to crash there. Note 2; I do not have admin rights in the PC's i am using, maybe that matters. Thank you, sorry if this has already been resolved, i did not find the answer.
@lpatruno I still suspect the dataset may need more memory (~2G) since some pipelines (especially for pipelines with PolynomialFeatures
which can double feature number in intermediates steps) require more memory which may cause crash due to out of memory. top
commend should has a refresh rate, maybe 2-3 seconds, so it maybe not accurate to check the maximum memory usage during optimization
Could you please try to run the dataset in a machine with large memory or using TPOT light
configuration via config_dict='TPOT light'
?
@g-vega-cl Hmm, it seems the crash only happened in PC but not in Linux in your case, right?
@weixuanfu Yes indeed, Windows 10
@weixuanfu Thanks for your tips. I will rerun with those parameters and report back.
@weixuanfu Running with config_dict='TPOT light'
allows the script to complete. I am rerunning now with a larger number of generations and population_size. Thanks for the help.
I am facing the same issues. I have tried running it on Windows 7 and Ubuntu 16.04 (AWS EC2 instance c5.4xlarge machine 32 GB RAM). I have always used n_jobs=1. On my windows machine it runs successfully but when I use Spyder IDE on Ubuntu, it unexpectedly quits saying Kernel unexpectedly stopped. I tried running it through the terminal and got a 'Segmentation fault'
My dataset size is not that huge either. Its around 335 samples x 80 features
@weixuanfu I'm having this issue too, I'm fairly certain I didn't run out of memory.
Also happening for me with a sparse dataset only (288176x28) and 128GB memory
I'm facing the same problem - running on alpine 3.9, with 8GB mem. (which should suffice given the data); n_jobs=1; any new insights?
I've found that if I use the dask backend (use_dask=True
) then everything runs smoothly.
Also happening for me, regardless of whether it is in a Jupyter kernel or not.
Hi,
I'm running into several issues when fitting instances of
TPOTClassifier
.When I run training in a Jupyter notebook, the kernel dies after a few rounds of training. The dataset is quite small (<1000 instances, <30 features). I also notice that the kernel dies more quickly when I increase the
population_size
andgenerations
arguments beyond20
. I'm settingn_jobs=1
as I've read other people have this same issue when that parameter is anything but1
. Here is the call:I've also run the same code as a Python script. This results in a segmentation fault each time I run the script.
I've run the script while having
top
open in another bash shell and the memory consumption in the process does not exceed 1% of the overall available memory, so I don't think it's a memory issue.I'm running this code in a kubernetes pod with the following resources:
Here is the version of Python:
Also, here is the result of calling
pip freeze
:If there are any other tests I can perform to help debug, let me know!