dask / dask-yarn

Deploy dask on YARN clusters
http://yarn.dask.org
BSD 3-Clause "New" or "Revised" License
69 stars 41 forks source link

EMR 6.3.0 Bootstrap Action BOOTSTRAP_FAILURE : Python 3.9 support? #151

Closed Nogbit closed 3 years ago

Nogbit commented 3 years ago

EMR doesnt start as it fails on the bootstrapping step. It looks like the EC2 instances used right now with EMR 6.3.0 all have Python 3.9 but that might be too high. I've tried all the EMR versions of 6.x, 5.3x and 5.20.0.

According to the docs Each Amazon EMR release version is "locked" to the Amazon Linux AMI version to maintain compatibility.. Though I'm not experiencing that. Every start of a cluster I get the same error below, even on EMR versions that came before the official release of Python 3.9.

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - conda-pack -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.5,<3.6.0a0']
  - dask-yarn -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.5,<3.6.0a0']

Your python: python=3.9

Release label: emr-6.3.0 Hadoop distribution: Amazon 3.2.1 Applications: Hive 3.1.2, Pig 0.17.0, Hue 4.9.0 Log URI: s3://cooldask-emr/logs/

Logs s3://cooldask-emr/logs/j-1X4JY6LKIYQFZ/node/i-059b9f6dbcce4c3e5/bootstrap-actions/1/controller.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 63.6M  100 63.6M    0     0   170M      0 --:--:-- --:--:-- --:--:--  169M

  0%|          | 0/38 [00:00<?, ?it/s]
Extracting : _libgcc_mutex-0.1-main.conda:   0%|          | 0/38 [00:00<?, ?it/s]
Extracting : _libgcc_mutex-0.1-main.conda:   3%|▎         | 1/38 [00:00<00:04,  8.04it/s]
Extracting : ca-certificates-2021.7.5-h06a4308_1.conda:   3%|▎         | 1/38 [00:00<00:04,  8.04it/s]
Extracting : ld_impl_linux-64-2.35.1-h7274673_9.conda:   5%|▌         | 2/38 [00:00<00:04,  8.04it/s] 
Extracting : libstdcxx-ng-9.3.0-hd4cf53a_17.conda:   8%|▊         | 3/38 [00:00<00:04,  8.04it/s]    
Extracting : tzdata-2021a-h52ac0ba_0.conda:  11%|█         | 4/38 [00:00<00:04,  8.04it/s]       
Extracting : libgomp-9.3.0-h5101ec6_17.conda:  13%|█▎        | 5/38 [00:00<00:04,  8.04it/s]
Extracting : libgcc-ng-9.3.0-h5101ec6_17.conda:  16%|█▌        | 6/38 [00:00<00:03,  8.04it/s]
Extracting : libffi-3.3-he6710b0_2.conda:  18%|█▊        | 7/38 [00:00<00:03,  8.04it/s]      
Extracting : ncurses-6.2-he6710b0_1.conda:  21%|██        | 8/38 [00:00<00:03,  8.04it/s]
Extracting : ncurses-6.2-he6710b0_1.conda:  24%|██▎       | 9/38 [00:00<00:02, 10.74it/s]
Extracting : openssl-1.1.1k-h27cfd23_0.conda:  24%|██▎       | 9/38 [00:00<00:02, 10.74it/s]
Extracting : xz-5.2.5-h7b6447c_0.conda:  26%|██▋       | 10/38 [00:00<00:02, 10.74it/s]     
Extracting : yaml-0.2.5-h7b6447c_0.conda:  29%|██▉       | 11/38 [00:00<00:02, 10.74it/s]
Extracting : zlib-1.2.11-h7b6447c_3.conda:  32%|███▏      | 12/38 [00:00<00:02, 10.74it/s]
Extracting : readline-8.1-h27cfd23_0.conda:  34%|███▍      | 13/38 [00:00<00:02, 10.74it/s]
Extracting : tk-8.6.10-hbc83047_0.conda:  37%|███▋      | 14/38 [00:00<00:02, 10.74it/s]   
Extracting : sqlite-3.36.0-hc218d9a_0.conda:  39%|███▉      | 15/38 [00:00<00:02, 10.74it/s]
Extracting : certifi-2021.5.30-py39h06a4308_0.conda:  42%|████▏     | 16/38 [00:00<00:02, 10.74it/s]
Extracting : chardet-4.0.0-py39h06a4308_1003.conda:  45%|████▍     | 17/38 [00:00<00:01, 10.74it/s] 
Extracting : pycosat-0.6.3-py39h27cfd23_0.conda:  47%|████▋     | 18/38 [00:00<00:01, 10.74it/s]   
Extracting : pycparser-2.20-py_2.conda:  50%|█████     | 19/38 [00:00<00:01, 10.74it/s]         
Extracting : pysocks-1.7.1-py39h06a4308_0.conda:  53%|█████▎    | 20/38 [00:00<00:01, 10.74it/s]
Extracting : ruamel_yaml-0.15.100-py39h27cfd23_0.conda:  55%|█████▌    | 21/38 [00:00<00:01, 10.74it/s]
Extracting : six-1.16.0-pyhd3eb1b0_0.conda:  58%|█████▊    | 22/38 [00:00<00:01, 10.74it/s]            
Extracting : tqdm-4.61.2-pyhd3eb1b0_1.conda:  61%|██████    | 23/38 [00:00<00:01, 10.74it/s]
Extracting : wheel-0.36.2-pyhd3eb1b0_0.conda:  63%|██████▎   | 24/38 [00:00<00:01, 10.74it/s]
Extracting : cffi-1.14.6-py39h400218f_0.conda:  66%|██████▌   | 25/38 [00:00<00:01, 10.74it/s]
Extracting : conda-package-handling-1.7.3-py39h27cfd23_1.conda:  68%|██████▊   | 26/38 [00:00<00:01, 10.74it/s]
Extracting : setuptools-52.0.0-py39h06a4308_0.conda:  71%|███████   | 27/38 [00:00<00:01, 10.74it/s]           
Extracting : brotlipy-0.7.0-py39h27cfd23_1003.conda:  74%|███████▎  | 28/38 [00:00<00:00, 10.74it/s]
Extracting : cryptography-3.4.7-py39hd23ed53_0.conda:  76%|███████▋  | 29/38 [00:00<00:00, 10.74it/s]
Extracting : pip-21.1.3-py39h06a4308_0.conda:  79%|███████▉  | 30/38 [00:00<00:00, 10.74it/s]        
Extracting : pyopenssl-20.0.1-pyhd3eb1b0_1.conda:  82%|████████▏ | 31/38 [00:00<00:00, 10.74it/s]
Extracting : urllib3-1.26.6-pyhd3eb1b0_1.conda:  84%|████████▍ | 32/38 [00:00<00:00, 10.74it/s]  
Extracting : requests-2.25.1-pyhd3eb1b0_0.conda:  87%|████████▋ | 33/38 [00:00<00:00, 10.74it/s]
Extracting : python-3.9.5-h12debd9_4.tar.bz2:  89%|████████▉ | 34/38 [00:03<00:00, 10.74it/s]   
Extracting : python-3.9.5-h12debd9_4.tar.bz2:  92%|█████████▏| 35/38 [00:03<00:00, 10.01it/s]
Extracting : _openmp_mutex-4.5-1_gnu.tar.bz2:  92%|█████████▏| 35/38 [00:03<00:00, 10.01it/s]
Extracting : idna-2.10-pyhd3eb1b0_0.tar.bz2:  95%|█████████▍| 36/38 [00:03<00:00, 10.01it/s] 
Extracting : conda-4.10.3-py39h06a4308_0.tar.bz2:  97%|█████████▋| 37/38 [00:03<00:00, 10.01it/s]

Building graph of deps:   0%|          | 0/10 [00:00<?, ?it/s]
Examining pyarrow:   0%|          | 0/10 [00:00<?, ?it/s]     
Examining @/linux-64::__archspec==1=x86_64:  10%|█         | 1/10 [01:52<16:54, 112.75s/it]
Examining @/linux-64::__archspec==1=x86_64:  20%|██        | 2/10 [01:52<07:31, 56.38s/it] 
Examining dask-yarn:  20%|██        | 2/10 [01:52<07:31, 56.38s/it]                       
Examining @/linux-64::__glibc==2.26=0:  30%|███       | 3/10 [01:59<06:34, 56.38s/it]
Examining @/linux-64::__glibc==2.26=0:  40%|████      | 4/10 [01:59<02:30, 25.07s/it]
Examining @/linux-64::__linux==4.14.225=0:  40%|████      | 4/10 [01:59<02:30, 25.07s/it]
Examining @/linux-64::__unix==0=0:  50%|█████     | 5/10 [01:59<02:05, 25.07s/it]        
Examining tornado=5:  60%|██████    | 6/10 [01:59<01:40, 25.07s/it]              
Examining s3fs:  70%|███████   | 7/10 [01:59<01:15, 25.07s/it]     
Examining s3fs:  80%|████████  | 8/10 [01:59<00:18,  9.46s/it]
Examining conda-pack:  80%|████████  | 8/10 [02:07<00:18,  9.46s/it]
Examining conda-pack:  90%|█████████ | 9/10 [02:07<00:09,  9.13s/it]
Examining python=3.9:  90%|█████████ | 9/10 [02:08<00:09,  9.13s/it]
Examining python=3.9: 100%|██████████| 10/10 [02:08<00:00,  7.39s/it]

Determining conflicts:   0%|          | 0/10 [00:00<?, ?it/s]
Examining conflict for pyarrow dask-yarn tornado s3fs conda-pack python:   0%|          | 0/10 [00:00<?, ?it/s]
Examining conflict for pyarrow dask-yarn:  10%|█         | 1/10 [02:10<19:31, 130.15s/it]                      
Examining conflict for pyarrow dask-yarn:  20%|██        | 2/10 [02:10<08:40, 65.07s/it] 
Examining conflict for pyarrow dask-yarn tornado s3fs conda-pack:  20%|██        | 2/10 [06:31<08:40, 65.07s/it]
Examining conflict for pyarrow dask-yarn tornado s3fs conda-pack:  30%|███       | 3/10 [06:31<17:06, 146.71s/it]
Examining conflict for tornado pyarrow python:  30%|███       | 3/10 [15:09<17:06, 146.71s/it]                   
Examining conflict for tornado pyarrow python:  40%|████      | 4/10 [15:09<28:32, 285.44s/it]
Examining conflict for pyarrow tornado s3fs conda-pack python:  40%|████      | 4/10 [17:18<28:32, 285.44s/it]
Examining conflict for pyarrow tornado s3fs conda-pack python:  50%|█████     | 5/10 [17:18<19:14, 230.94s/it]
Examining conflict for conda-pack pyarrow dask-yarn python:  50%|█████     | 5/10 [19:28<19:14, 230.94s/it]   
Examining conflict for conda-pack pyarrow dask-yarn python:  60%|██████    | 6/10 [19:28<13:10, 197.53s/it]
Examining conflict for tornado pyarrow dask-yarn:  60%|██████    | 6/10 [19:32<13:10, 197.53s/it]          
Examining conflict for tornado pyarrow dask-yarn:  70%|███████   | 7/10 [19:32<06:45, 135.11s/it]
Examining conflict for tornado pyarrow __glibc python:  70%|███████   | 7/10 [19:35<06:45, 135.11s/it]
Examining conflict for tornado pyarrow __glibc python:  80%|████████  | 8/10 [19:35<03:07, 93.68s/it] 
Examining conflict for pyarrow s3fs:  80%|████████  | 8/10 [21:43<03:07, 93.68s/it]                  
Examining conflict for pyarrow s3fs:  90%|█████████ | 9/10 [21:43<01:44, 104.29s/it]
Examining conflict for pyarrow conda-pack:  90%|█████████ | 9/10 [23:52<01:44, 104.29s/it]
Examining conflict for pyarrow conda-pack: 100%|██████████| 10/10 [23:52<00:00, 111.98s/it]
Examining conflict for tornado dask-yarn: 100%|██████████| 10/10 [26:00<00:00, 111.98s/it] 
Examining conflict for tornado dask-yarn: : 11it [26:00, 116.91s/it]                      
Examining conflict for dask-yarn s3fs: : 11it [26:01, 116.91s/it]   
Examining conflict for dask-yarn s3fs: : 12it [26:01, 81.60s/it] 

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - conda-pack -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.5,<3.6.0a0']
  - dask-yarn -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.5,<3.6.0a0']

Your python: python=3.9

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package tzdata conflicts for:
s3fs -> python[version='>=3.6'] -> tzdata
tornado=5 -> python[version='>=3.9,<3.10.0a0'] -> tzdata
python=3.9 -> tzdata
pyarrow -> python[version='>=3.9,<3.10.0a0'] -> tzdata
conda-pack -> python -> tzdata

Package setuptools conflicts for:
dask-yarn -> distributed[version='>=2021.1.0'] -> setuptools
python=3.9 -> pip -> setuptools
conda-pack -> setuptools
pyarrow -> setuptools

Package _libgcc_mutex conflicts for:
python=3.9 -> libgcc-ng[version='>=9.3.0'] -> _libgcc_mutex[version='*|0.1',build='main|main|conda_forge']
pyarrow -> libgcc-ng[version='>=9.4.0'] -> _libgcc_mutex[version='*|0.1',build='main|main|conda_forge']
tornado=5 -> libgcc-ng[version='>=7.3.0'] -> _libgcc_mutex[version='*|0.1|0.1',build='main|main|conda_forge']

Package pypy3.7 conflicts for:
pyarrow -> numpy[version='>=1.16,<2.0a0'] -> pypy3.7[version='7.3.3.*|7.3.4.*|7.3.5.*|>=7.3.3|>=7.3.4|>=7.3.5']
dask-yarn -> distributed[version='>=2021.1.0'] -> pypy3.7[version='7.3.3.*|7.3.4.*|7.3.5.*|>=7.3.3|>=7.3.4|>=7.3.5']
s3fs -> python[version='>=3.6'] -> pypy3.7[version='7.3.3.*|7.3.4.*|7.3.5.*']
conda-pack -> python -> pypy3.7[version='7.3.3.*|7.3.4.*|7.3.5.*|>=7.3.5|>=7.3.3']
tornado=5 -> python[version='>=3.7,<3.8.0a0'] -> pypy3.7[version='7.3.3.*|7.3.4.*|7.3.5.*']

Package c-ares conflicts for:
pyarrow -> arrow-cpp==5.0.0=py37ha37954a_3_cpu -> c-ares[version='>=1.16.1,<2.0a0|>=1.17.1,<2.0a0|>=1.17.2,<2.0a0']
dask-yarn -> grpcio[version='>=1.14.0'] -> c-ares[version='>=1.14.0,<2.0a0|>=1.15.0,<2.0a0|>=1.16.1,<2.0a0|>=1.17.1,<2.0a0']

Package python_abi conflicts for:
pyarrow -> python_abi[version='3.6.*|3.7.*|3.9.*|3.8.*',build='*_cp39|*_cp37m|*_cp38|*_cp36m']
pyarrow -> numpy[version='>=1.16,<2.0a0'] -> python_abi[version='2.7.*|3.6|3.7',build='*_pypy36_pp73|*_pypy37_pp73|*_cp27mu']

Package fsspec conflicts for:
dask-yarn -> dask-core[version='>=2021.1.0'] -> fsspec[version='>=0.4.0|>=0.5.1|>=0.6.0']
s3fs -> fsspec[version='2021.04.0|2021.05.0|2021.06.0|2021.06.1|2021.7.0|>=0.9.0|>=0.8.0|>=0.6.0|2021.7.0.*|2021.6.0.*']

Package numpy conflicts for:
pyarrow -> pandas -> numpy[version='>=1.11|>=1.11.*|>=1.12.1,<2.0a0|>=1.14.6,<2.0a0|>=1.15.4,<2.0a0|>=1.16.5,<2.0a0|>=1.18.5,<2.0a0|>=1.19.4,<2.0a0|>=1.19.2,<2.0a0|>=1.18.4,<2.0a0|>=1.18.1,<2.0a0|>=1.9.3,<2.0a0|>=1.9.*|>=1.9|>=1.8|>=1.7|>=1.20.3,<2.0a0|>=1.20.2,<2.0a0|>=1.13.3,<2.0a0']
pyarrow -> numpy[version='1.10.*|1.11.*|1.12.*|1.13.*|>=1.10|>=1.10,<1.20.0a0|>=1.14,<1.20.0a0|>=1.16,<1.20.0a0|>=1.16,<2.0a0|>=1.17.5,<2.0a0|>=1.19.5,<2.0a0|>=1.16.6,<2.0a0|>=1.11,<2.0a0|>=1.14,<2.0a0|>=1.15.3,<2.0a0|>=1.15.2,<2.0a0|>=1.11.3,<2.0a0']

Package futures conflicts for:
dask-yarn -> grpcio[version='>=1.14.0'] -> futures[version='>=2.2.0']
pyarrow -> futures
tornado=5 -> futures

Package certifi conflicts for:
pyarrow -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
conda-pack -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']

Package python-dateutil conflicts for:
pyarrow -> pandas -> python-dateutil[version='>=2.5.*|>=2.6.1|>=2.7.3']
s3fs -> botocore -> python-dateutil[version='>=2.1,<2.7.0|>=2.1,<2.8.1|>=2.1,<3.0.0']The following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.26=0
  - feature:|@/linux-64::__glibc==2.26=0
  - pyarrow -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
  - tornado=5 -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.26
Nogbit commented 3 years ago

Sorry, let me edit that, that log was from the worker and not the master, much more info to come in a sec.

Nogbit commented 3 years ago

Updated the original description and title

Nogbit commented 3 years ago

The python issue above was becuase conda was using latest, which brings with it Python 3.9. So that was not the AMI's fault.

However, even when using conda with Python 3.8 you will still have issues since the AWS Amazon Linux 2 AMI does not have initctl and instead uses systemctl.

Also, you will need to make the boot volume greater than the 10GB default as the bootstrap action will finish, but Hive fill fail to install afterwards as you will run out of disk space. 20GB will suffice.

This bootstrap worked for me. https://gist.github.com/Nogbit/f15e1c2be59bcc4ad122171b2e56cdeb