Closed edubu2 closed 1 year ago
Conda list output:
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
_py-xgboost-mutex 2.0 cpu_0 conda-forge
abseil-cpp 20211102.0 h93e1e8c_2 conda-forge
arrow-cpp 8.0.0 py310h3098874_0
aws-c-common 0.4.57 he6710b0_1
aws-c-event-stream 0.1.6 h2531618_5
aws-checksums 0.1.9 he6710b0_0
aws-sdk-cpp 1.8.185 hce553d0_0
blas 1.0 mkl
bokeh 2.4.3 pyhd8ed1ab_3 conda-forge
boost-cpp 1.78.0 he72f1d9_0 conda-forge
boto3 1.24.28 py310h06a4308_0 anaconda
botocore 1.27.28 py310h06a4308_0 anaconda
brotli 1.0.9 h166bdaf_7 conda-forge
brotli-bin 1.0.9 h166bdaf_7 conda-forge
brotlipy 0.7.0 py310h5764c6d_1004 conda-forge
bzip2 1.0.8 h7b6447c_0
c-ares 1.18.1 h7f98852_0 conda-forge
ca-certificates 2022.9.24 ha878542_0 conda-forge
certifi 2022.9.24 pyhd8ed1ab_0 conda-forge
cffi 1.15.1 py310h74dc2b5_0
charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge
click 8.1.3 py310hff52083_0 conda-forge
cloudpickle 2.2.0 pyhd8ed1ab_0 conda-forge
colorama 0.4.5 pyhd8ed1ab_0 conda-forge
cryptography 38.0.2 py310h597c629_1 conda-forge
cudatoolkit 11.3.1 h2bc3f7f_2
cytoolz 0.12.0 py310h5764c6d_0 conda-forge
dask 2022.10.0 pyhd8ed1ab_2 conda-forge
dask-core 2022.10.0 pyhd8ed1ab_1 conda-forge
dask-glm 0.2.0 py_1 conda-forge
dask-ml 2022.5.27 pyhd8ed1ab_0 conda-forge
deap 1.3.3 py310h769672d_0 conda-forge
distributed 2022.10.0 pyhd8ed1ab_2 conda-forge
fftw 3.3.9 h27cfd23_1
freetype 2.10.4 h0708190_1 conda-forge
fsspec 2022.10.0 pyhd8ed1ab_0 conda-forge
gflags 2.2.2 he1b5a44_1004 conda-forge
giflib 5.2.1 h36c2ea0_2 conda-forge
glog 0.6.0 h6f12383_0 conda-forge
greenlet 1.1.1 py310h295c915_0 anaconda
grpc-cpp 1.46.1 h33aed49_0
heapdict 1.0.1 py_0 conda-forge
icu 70.1 h27087fc_0 conda-forge
idna 3.4 pyhd8ed1ab_0 conda-forge
intel-openmp 2021.4.0 h06a4308_3561
jinja2 3.1.2 pyhd8ed1ab_1 conda-forge
jmespath 0.10.0 pyhd3eb1b0_0 anaconda
joblib 1.2.0 pyhd8ed1ab_0 conda-forge
jpeg 9e h166bdaf_2 conda-forge
krb5 1.19.2 hac12032_0 anaconda
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h9c3ff4c_0 conda-forge
libabseil 20211102.0 cxx17_h48a1fff_2 conda-forge
libbrotlicommon 1.0.9 h166bdaf_7 conda-forge
libbrotlidec 1.0.9 h166bdaf_7 conda-forge
libbrotlienc 1.0.9 h166bdaf_7 conda-forge
libcurl 7.84.0 h91b91d3_0
libdeflate 1.8 h7f8727e_5
libedit 3.1.20210910 h7f8727e_0 anaconda
libev 4.33 h516909a_1 conda-forge
libevent 2.1.10 h9b69904_4 conda-forge
libffi 3.3 he6710b0_2
libgcc-ng 12.2.0 h65d4601_19 conda-forge
libgfortran-ng 11.2.0 h00389a5_1
libgfortran5 11.2.0 h1234567_1
libllvm11 11.1.0 hf817b99_2 conda-forge
libnghttp2 1.46.0 hce63b2e_0
libpng 1.6.37 hbc83047_0
libpq 12.9 h16c4e8d_3 anaconda
libprotobuf 3.20.1 h4ff587b_0
libssh2 1.10.0 ha56f1ee_2 conda-forge
libstdcxx-ng 12.2.0 h46fd767_19 conda-forge
libthrift 0.15.0 he6d91bd_0 conda-forge
libtiff 4.4.0 hecacb30_0
libuuid 1.0.3 h7f8727e_2
libwebp 1.2.4 h522a892_0 conda-forge
libwebp-base 1.2.4 h166bdaf_0 conda-forge
libxgboost 1.6.2 cpu_ha3b9936_1 conda-forge
llvm-openmp 14.0.6 h9e868ea_0
llvmlite 0.39.1 py310he621ea3_0
locket 1.0.0 pyhd8ed1ab_0 conda-forge
lz4 4.0.0 py310h5d5e884_2 conda-forge
lz4-c 1.9.3 h9c3ff4c_1 conda-forge
markupsafe 2.1.1 py310h5764c6d_1 conda-forge
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py310h7f8727e_0
mkl_fft 1.3.1 py310hd6ae3a3_0
mkl_random 1.2.2 py310h00e6091_0
msgpack-python 1.0.4 py310hbf28c38_0 conda-forge
multipledispatch 0.6.0 py_0 conda-forge
ncurses 6.3 h5eee18b_3
numba 0.56.3 py310ha5257ce_0 conda-forge
numpy 1.22.3 py310hfa59a62_0
numpy-base 1.22.3 py310h9585f30_0
openssl 1.1.1q h166bdaf_1 conda-forge
orc 1.7.4 h07ed6aa_0
packaging 21.3 pyhd8ed1ab_0 conda-forge
pandas 1.5.1 py310h769672d_0 conda-forge
partd 1.3.0 pyhd8ed1ab_0 conda-forge
pillow 9.2.0 py310hace64e9_1
pip 22.2.2 py310h06a4308_0
psutil 5.9.3 py310h5764c6d_0 conda-forge
psycopg2 2.8.6 py310h8f2d780_1 anaconda
py-xgboost 1.6.2 cpu_py310hd1aba9c_1 conda-forge
pyarrow 8.0.0 py310h468efa6_0
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pyopenssl 22.1.0 pyhd8ed1ab_0 conda-forge
pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.10.6 haa1d7c7_1
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python_abi 3.10 2_cp310 conda-forge
pytorch 1.12.1 py3.10_cuda11.3_cudnn8.3.2_0 pytorch
pytorch-mutex 1.0 cuda pytorch
pytz 2022.5 pyhd8ed1ab_0 conda-forge
pyyaml 6.0 py310h5764c6d_4 conda-forge
re2 2022.04.01 h27087fc_0 conda-forge
readline 8.1.2 h7f8727e_1
requests 2.28.1 pyhd8ed1ab_1 conda-forge
s3transfer 0.6.0 py310h06a4308_0 anaconda
scikit-learn 1.1.2 py310h6a678d5_0
scipy 1.9.1 py310hd5efca6_0
setuptools 63.4.1 py310h06a4308_0
six 1.16.0 pyhd3eb1b0_1
snappy 1.1.9 hbd366e4_1 conda-forge
sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge
sqlalchemy 1.4.39 py310h5eee18b_0 anaconda
sqlite 3.39.3 h5082296_0
stopit 1.1.2 py_0 conda-forge
tblib 1.7.0 pyhd8ed1ab_0 conda-forge
threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge
tk 8.6.12 h1ccaba5_0
toolz 0.12.0 pyhd8ed1ab_0 conda-forge
tornado 6.1 py310h5764c6d_3 conda-forge
tpot 0.11.7 pyhd8ed1ab_1 conda-forge
tqdm 4.64.1 pyhd8ed1ab_0 conda-forge
typing_extensions 4.4.0 pyha770c72_0 conda-forge
tzdata 2022e h04d1e81_0
update_checker 0.18.0 pyh9f0ad1d_0 conda-forge
urllib3 1.26.11 pyhd8ed1ab_0 conda-forge
utf8proc 2.6.1 h27cfd23_0
wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.6 h5eee18b_0
yaml 0.2.5 h7f98852_2 conda-forge
zict 2.2.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.13 h5eee18b_0
zstd 1.5.2 ha4553b6_0
Log output thus far (has been stuck on pipeline 55 since I checked this morning).
$ cat tpot_log.log
_pre_test decorator: _random_mutation_operator: num_test=0 Expected n_neighbors <= n_samples, but n_samples = 50, n_neighbors = 85.
_pre_test decorator: _random_mutation_operator: num_test=0 Expected n_neighbors <= n_samples, but n_samples = 50, n_neighbors = 80.
_pre_test decorator: _random_mutation_operator: num_test=0 manhattan was provided as affinity. Ward can only work with euclidean distances..
_pre_test decorator: _random_mutation_operator: num_test=0 Unsupported set of arguments: The combination of penalty='l2' and loss='epsilon_insensitive' are not supported when dual=False, Parameters: penalty='l2', loss='epsilon_insensitive', dual=False.
_pre_test decorator: _random_mutation_operator: num_test=0 Expected n_neighbors <= n_samples, but n_samples = 50, n_neighbors = 54.
Optimization Progress: 2%|▏ | 54/2550 [8:57:48<314:58:55, 454.30s/pipeline]
Seems to be working. Disregard
I started with TPOTRegressor on a large dataset of 8M Rows x 40 features yesterday on a large ML server (Linux RHE) with 16 CPU (2 threads per core) and 256GiB memory (no GPU, no Pytorch NNs). Last night, when I started it, it was running consistently at 3200% CPU (one per thread, as intended). However, when I returned to check on it this morning, total CPU utilization has been reduced back to 100%, sometimes jumping to 200% but not more. This has been happening for at least 4 hours. There is nothing else running on the machine. Perhaps it's trying a model in which multiprocessing isn't possible, but I feel TPOT should be using the available resources for the next pipeline.
Context of the issue
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17946 ec2-user 20 0 42.3g 14.0g 111504 S 100.3 5.6 25959:06 python
As it's only 2% complete after 12 hours, it's not a viable option for my pipeline tuning and model selection. Downsampling is not ideal for my use case, but I am still using it to reduce the size by 40% to increase speed. For comparison, I'm able to preprocess and run LightGBM model on my local (8 cores/16GB RAM, OSX) using the same data (but no downsampling), in about 5 minutes.
I've one-hot encoded my categorical features and imputed values for all NaN records.
Another thing to point out (likely not useful, but maybe) is that progress bar was displaying 0% for at least 6 hours after starting, while CPU was at 3200%. When I checked this morning, 2% complete with 100-200% CPU utilization.
Process to reproduce the issue
Below code is part of my main() function being called at the command line with
nohup run.py &
.Update
After 3-4 hours, it's now back up to 3200%. Likely not an issue with multiprocessing, but I'm really curious about what processing could take so long.