Open ResidentMario opened 4 years ago
Thank you for the error report @ResidentMario . My apologies in the delayed response. The folks who maintain this repository have been busy lately.
Do you have any interest in submitting a patch to resolve this issue?
I might be able to look into it, but no promises.
I am trying to debug some other issues with this and found that by using the EMR release emr-5.29.0
instead of emr-5.30.1
resolves the problem. It looks like something in the new image is causing the problem. Thought that bit of intel might help.
Apparently emr-5.30 onwards they only support systemd and no longer support upstart.
I think it has to do with Amazon Linux 2:
Amazon Linux 2 support – In EMR version 5.30.0 and later, EMR uses Amazon Linux 2 OS. New custom AMIs (Amazon Machine Image) must be based on the Amazon Linux 2 AMI. For more information, see Using a Custom AMI.
I tried to make it work with systemd and updated it with the following:
# -----------------------------------------------------------------------------
# 10. Configure Jupyter Notebook
# -----------------------------------------------------------------------------
echo "Configuring Jupyter"
mkdir -p $HOME/.jupyter
HASHED_PASSWORD=`python -c "from notebook.auth import passwd; print(passwd('$JUPYTER_PASSWORD'))"`
cat <<EOF >> $HOME/.jupyter/jupyter_notebook_config.py
c.NotebookApp.password = u'$HASHED_PASSWORD'
c.NotebookApp.open_browser = False
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.port = 8889
EOF
# # -----------------------------------------------------------------------------
# # 11. Define an upstart service for the Jupyter Notebook Server
# #
# # This sets the notebook server up to properly run as a background service.
# # -----------------------------------------------------------------------------
echo "Configuring Jupyter Notebook Upstart Service"
cat <<EOF > /tmp/jupyter-notebook.service
[Unit]
Description=Jupyter Notebook
[Service]
ExecStart=$HOME/miniconda/bin/jupyter-notebook --config=$HOME/.jupyter/jupyter_notebook_config.py
Type=simple
PIDFile=/run/jupyter.pid
WorkingDirectory=$HOME
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
sudo mv /tmp/jupyter-notebook.service /etc/init.d/
sudo systemctl enable /etc/init.d/jupyter-notebook.service
# # -----------------------------------------------------------------------------
# # 12. Start the Jupyter Notebook Server
# # -----------------------------------------------------------------------------
# echo "Starting Jupyter Notebook Server"
sudo systemctl daemon-reload
sudo systemctl restart jupyter-notebook.service
Note: I added a port for the notebook.
This runs on bootstrap but there is nothing on port 8889. When I run the ExecStart command manually, via ssh, the notebook opens. So not sure what I'm doing wrong. I also get the following problem: https://github.com/dask/dask-yarn/issues/124
Sources for the new script:
EMR version 5.31.0 Hadoop distribution:Amazon 2.10.0 Python: 3.7.9
Hi @hiqbal2, Your script for systemd was super helpful. I got it to work by doing a few changes to this script.
I hope this helps
@hegde-anish thanks for the help. EMR seems to bootstrap properly now. However, not sure if you got this error when trying to start a dask cluster:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-762253d83df2> in <module>
1 # Create a cluster
----> 2 cluster = YarnCluster()
3
4 # Connect to the cluster
5 client = Client(cluster)
/home/hadoop/miniconda/lib/python3.8/site-packages/dask_yarn/core.py in __init__(self, environment, n_workers, worker_vcores, worker_memory, worker_restarts, worker_env, scheduler_vcores, scheduler_memory, deploy_mode, name, queue, tags, user, host, port, dashboard_address, skein_client, asynchronous, loop)
366 loop=None,
367 ):
--> 368 spec = _make_specification(
369 environment=environment,
370 n_workers=n_workers,
/home/hadoop/miniconda/lib/python3.8/site-packages/dask_yarn/core.py in _make_specification(**kwargs)
184 "See http://yarn.dask.org/environments.html for more information."
185 )
--> 186 raise ValueError(msg)
187
188 n_workers = lookup(kwargs, "n_workers", "yarn.worker.count")
ValueError: You must provide a path to a Python environment for the workers.
This may be one of the following:
- A conda environment archived with conda-pack
- A virtual environment archived with venv-pack
- A path to a conda environment, specified as conda://...
- A path to a virtual environment, specified as venv://...
- A path to a python binary to use, specified as python://...
See http://yarn.dask.org/environments.html for more information.
Not sure why this is happening.
I am also not sure if there is a difference in behaviour in just calling $HOME/miniconda/bin/jupyter-notebook
vs the original script: exec su - hadoop -c "jupyter notebook"
. When I try the old command i get the error that hadoop -c
does not exists.
I don't have any experience with hadoop or dask, so am a little lost on debugging this.
This modified bootstrap script worked for me, with a few additional fixes:
conda pack
failed with python=3.8.5
(see #133), so I specified a 3.7 versiontornado
6.1, which I found worked with jupyter-server-proxy
1.5.2 without issue (despite the comment in the script saying otherwise)python -> /usr/bin/python3
and pip -> /usr/bin/pip3
in /etc/bashrc
(which gets imported into $HOME/.bashrc
). This interferes with conda, since we want python -> ~/miniconda/bin/python
ValueError: You must provide a path to a Python environment for the workers
issue that @hiqbal2 encountered. The root cause (no pun intended) is that the notebook server is running as root
instead of the hadoop
user.To fix the latter two issues, I added unalias
commands to ~/.bashrc
before source
ing it, which feels like a bit of a hack:
# -----------------------------------------------------------------------------
# 2. Install Miniconda
# -----------------------------------------------------------------------------
echo "Installing Miniconda"
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /tmp/miniconda.sh
bash /tmp/miniconda.sh -b -p $HOME/miniconda
rm /tmp/miniconda.sh
echo -e 'unalias python || true' >> $HOME/.bashrc
echo -e 'unalias pip || true' >> $HOME/.bashrc
echo -e '\nexport PATH=$HOME/miniconda/bin:$PATH' >> $HOME/.bashrc
source $HOME/.bashrc
conda update conda -y
and I specified a User
in the systemd [Service]
section (which also let me remove the --allow-root
flag that @hegde-anish suggested). I also had to export the JAVA_HOME
environment variable:
# -----------------------------------------------------------------------------
# 11. Define an upstart service for the Jupyter Notebook Server
#
# This sets the notebook server up to properly run as a background service.
# -----------------------------------------------------------------------------
echo "Configuring Jupyter Notebook Upstart Service"
cat <<EOF > /tmp/jupyter-notebook.service
[Unit]
Description=Jupyter Notebook
[Service]
User=hadoop
ExecStart=$HOME/miniconda/bin/jupyter-notebook --config=$HOME/.jupyter/jupyter_notebook_config.py
Environment=JAVA_HOME=$JAVA_HOME
Type=simple
PIDFile=/run/jupyter.pid
WorkingDirectory=$HOME
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
sudo mv /tmp/jupyter-notebook.service /etc/systemd/system/
sudo systemctl enable jupyter-notebook
# -----------------------------------------------------------------------------
# 12. Start the Jupyter Notebook Server
# -----------------------------------------------------------------------------
echo "Starting Jupyter Notebook Server"
sudo systemctl daemon-reload
sudo systemctl start jupyter-notebook
EMR version 5.32.0 Hadoop distribution: Amazon 2.10.1 Python 3.7.6
The above worked for me. However, the jupyter notebook now just does not output any values. I tried to start the notebook via ssh and got the following error when trying to do a simple 2+2
:
[E 12:12:00.355 NotebookApp] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7fc061beb4d0>, <Future finished exception=TimeoutError('Timeout')>)
Traceback (most recent call last):
File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/ioloop.py", line 758, in _run_callback
ret = callback()
File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/websocket.py", line 553, in <lambda>
self.stream.io_loop.add_future(result, lambda f: f.result())
tornado.util.TimeoutError: Timeout
ERROR:asyncio:Future exception was never retrieved
future: <Future finished exception=TimeoutError('Timeout')>
Traceback (most recent call last):
File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/websocket.py", line 757, in _accept_connection
yield open_result
File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/ioloop.py", line 758, in _run_callback
ret = callback()
File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/websocket.py", line 553, in <lambda>
self.stream.io_loop.add_future(result, lambda f: f.result())
tornado.util.TimeoutError: Timeout
@kqshan this is great, thanks.
I didn't find I needed to to unalias, after the bootstrap I had proper pointers to miniconda python/pip. I'm running a newer EMR (emr-6.2.0) so this may be a factor.
I removed the version pin for tornado as well.
The conda pack issue appears to be from this conda issue. I added --ignore-missing-files
and it resolved although I don't know if I'll hit environment synchronization issues with my workers as a result (haven't gotten that far in testing yet)
Also the version spec for dask-yarn causes a file to be written to the home folder called ''=0.7.0". Some escaping or quoting likely necessary to fix but I just removed the version specification because conda installed 0.8.1 on its own.
I also ran into this issue when trying the bootstrap script. @davegravy did you test your script? Do you have a version that works with emr-6.2.0?
What version of conda-pack
is used ? I believe 0.6 was released a month ago
I also ran into this issue when trying the bootstrap script. @davegravy did you test your script? Do you have a version that works with emr-6.2.0?
My script works with emr-6.2.0, yes. I've had no issues with dask & EMR, at least not that I can attribute to the bootstrap.
I also ran into this issue when trying the bootstrap script. @davegravy did you test your script? Do you have a version that works with emr-6.2.0?
My script works with emr-6.2.0, yes. I've had no issues with dask & EMR, at least not that I can attribute to the bootstrap.
Can you share it?
I also ran into this issue when trying the bootstrap script. @davegravy did you test your script? Do you have a version that works with emr-6.2.0?
My script works with emr-6.2.0, yes. I've had no issues with dask & EMR, at least not that I can attribute to the bootstrap.
Can you share it?
Sure:
https://gist.github.com/davegravy/61e3abb81176f4490032554b70d28c31
Hello, i tried install dask with many versions of bootstrap and EMR versions but anything doesnt work. If it's possible share with me what EMR version and dask bootstrap you used. Thanks @davegravy in your bootstrap the line 125 is censured "Downloading pyquis step".
Hello, i tried install dask with many versions of bootstrap and EMR versions but anything doesnt work. If it's possible share with me what EMR version and dask bootstrap you used. Thanks
Hi I was using EMR 6.2.0.
@davegravy in your bootstrap the line 125 is censured "Downloading pyquis step".
This is a private python library my bootstrap script installs. It shouldn't have any bearing on the bootstrap's ability to succeed.
The EMR bootstrap script currently fails with the following error (found via
stderr
logs):