dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 718 forks source link

Dashboard missing when starting in a Docker container #1875

Open jakirkham opened 6 years ago

jakirkham commented 6 years ago

For some reason, starting distributed in a Docker container results in the bokeh-based dashboard being unavailable. This is when the correct port (8787) is forwarded from the container. Have also verified that the Bokeh dashboard is running on that port in the container. Strangely the same issue does not occur when using dask-drmaa or when using distributed outside of a Docker container.

Example: ```python >>> import distributed >>> cluster = distributed.LocalCluster() >>> client = distributed.Client(cluster) ```


Environment: ```yaml name: test channels: - conda-forge - defaults dependencies: - blas=1.1=openblas - bokeh=0.12.14=py36_1 - ca-certificates=2018.1.18=0 - certifi=2018.1.18=py36_0 - click=6.7=py_1 - cloudpickle=0.5.2=py_0 - cytoolz=0.9.0.1=py36_0 - dask-core=0.17.2=py_0 - distributed=1.21.4=py36_0 - heapdict=1.0.0=py36_0 - jinja2=2.10=py36_0 - markupsafe=1.0=py36_0 - msgpack-python=0.5.6=py36_0 - ncurses=5.9=10 - numpy=1.14.2=py36_blas_openblas_200 - openblas=0.2.20=7 - openssl=1.0.2n=0 - packaging=17.1=py_0 - pip=9.0.1=py36_1 - psutil=5.4.3=py36_0 - pyparsing=2.2.0=py36_0 - python=3.6.5=0 - python-dateutil=2.7.2=py_0 - pyyaml=3.12=py36_1 - readline=7.0=0 - setuptools=39.0.1=py36_0 - six=1.11.0=py36_1 - sortedcontainers=1.5.9=py36_0 - sqlite=3.20.1=2 - tblib=1.3.2=py36_0 - tk=8.6.7=0 - toolz=0.9.0=py_0 - tornado=5.0.1=py36_1 - wheel=0.30.0=py36_2 - xz=5.2.3=0 - yaml=0.1.7=0 - zict=0.1.3=py_0 - zlib=1.2.11=0 - libgfortran=3.0.0=1 ```


Using macOS 10.11 with Docker for Mac 18.03.0-ce-mac58 (23607) from the Edge channel. Also used the condaforge/linux-anvil image as a base image with digest sha256:e4dbddbf5c1d1e5143b003608c2a29e6437cdbdb6c4b748cd09fee35f63ab8b3. Though expect any minimal Docker image with a copy of Miniconda to do the install would also work.

mrocklin commented 6 years ago

In what way is it unavailable? Page doesn't load? Page loads but nothing shows up?

On Fri, Mar 30, 2018 at 11:09 AM, jakirkham notifications@github.com wrote:

For some reason, starting distributed in a Docker container results in the bokeh-based dashboard being unavailable. This is when the correct port (8787) is forwarded from the container. Have also verified that the Bokeh dashboard is running on that port in the container. Strangely the same issue does not occur when using dask-drmaa or when using distributed outside of a Docker container. Example:

import distributed>>> cluster = distributed.LocalCluster()>>> client = distributed.Client(cluster)

Environment:

name: testchannels:

  • conda-forge
  • defaultsdependencies:
  • blas=1.1=openblas
  • bokeh=0.12.14=py36_1
  • ca-certificates=2018.1.18=0
  • certifi=2018.1.18=py36_0
  • click=6.7=py_1
  • cloudpickle=0.5.2=py_0
  • cytoolz=0.9.0.1=py36_0
  • dask-core=0.17.2=py_0
  • distributed=1.21.4=py36_0
  • heapdict=1.0.0=py36_0
  • jinja2=2.10=py36_0
  • markupsafe=1.0=py36_0
  • msgpack-python=0.5.6=py36_0
  • ncurses=5.9=10
  • numpy=1.14.2=py36_blas_openblas_200
  • openblas=0.2.20=7
  • openssl=1.0.2n=0
  • packaging=17.1=py_0
  • pip=9.0.1=py36_1
  • psutil=5.4.3=py36_0
  • pyparsing=2.2.0=py36_0
  • python=3.6.5=0
  • python-dateutil=2.7.2=py_0
  • pyyaml=3.12=py36_1
  • readline=7.0=0
  • setuptools=39.0.1=py36_0
  • six=1.11.0=py36_1
  • sortedcontainers=1.5.9=py36_0
  • sqlite=3.20.1=2
  • tblib=1.3.2=py36_0
  • tk=8.6.7=0
  • toolz=0.9.0=py_0
  • tornado=5.0.1=py36_1
  • wheel=0.30.0=py36_2
  • xz=5.2.3=0
  • yaml=0.1.7=0
  • zict=0.1.3=py_0
  • zlib=1.2.11=0
  • libgfortran=3.0.0=1

Using macOS 10.11 with Docker for Mac 18.03.0-ce-mac58 (23607) from the Edge channel.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/1875, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszN-6blM2O0x0hbDHJue_jnZo9y7xks5tjkrEgaJpZM4TBy3s .

jakirkham commented 6 years ago

Page doesn't load.

mrocklin commented 6 years ago

404 ?

Do the scheduler logs show anything? Perhaps bokeh doesn't recognize the address that the remote machine is calling it.

On Fri, Mar 30, 2018 at 11:30 AM, jakirkham notifications@github.com wrote:

Page doesn't load.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/1875#issuecomment-377550256, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszJHmXPEctgl0ffYUOn4eLx0S3iIcks5tjk-kgaJpZM4TBy3s .

jakirkham commented 6 years ago

Right 404.

Will look and report back.

Accessing it on localhost. Would hope that is allowed. :)

Are you able to reproduce it?

mrocklin commented 6 years ago

I haven't taken the time to set up an appropriate docker container. You might also try --network host

On Fri, Mar 30, 2018 at 11:50 AM, jakirkham notifications@github.com wrote:

Right 404.

Will look and report back.

Accessing it on localhost. Would hope that is allowed. :)

Are you able to reproduce it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/1875#issuecomment-377554795, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszNhZLU-g9geoTqZUpixRXfhI7x7rks5tjlRcgaJpZM4TBy3s .

jakirkham commented 6 years ago

No worries.

Ran client.get_scheduler_logs() and it looks ok. Confirms Bokeh should be up and running on 8787. Are there more detailed logs I can dive through? Is there a way to get logs from Bokeh?

mrocklin commented 6 years ago

get_scheduler_logs will only get the logs from dask.distributed, not bokeh. Presumably Docker has some mechanism to get the logs for a running container.

On Fri, Mar 30, 2018 at 12:17 PM, jakirkham notifications@github.com wrote:

No worries.

Ran client.get_scheduler_logs() and it looks ok. Confirms Bokeh should be up and running on 8787. Are there more detailed logs I can dive through? Is there a way to get logs from Bokeh?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/1875#issuecomment-377560731, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszEm9IFtvizhWrRAbQYhocgG9yB38ks5tjlqXgaJpZM4TBy3s .

jakirkham commented 6 years ago

So Docker only shows what gets printed to stdout/stderr, which in the MRE shows nothing. If I run using the same startup mechanism, but with a few more bells and whistles (i.e. Jupyter Notebook) then the logging is just from the Jupyter Notebook.

Is there some way to get access to the Bokeh Client running the dashboard from the Distributed Client somehow?

mrocklin commented 6 years ago

use LocalCluster(silence_logs=False)

On Fri, Mar 30, 2018 at 12:24 PM, jakirkham notifications@github.com wrote:

So Docker only shows what gets printed to stdout/stderr, which in the MRE shows nothing. If I run using the same startup mechanism, but with a few more bells and whistles (i.e. Jupyter Notebook) then the logging is just from the Jupyter Notebook.

Is there some way to get access to the Bokeh Client running the dashboard from the Distributed Client somehow?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/1875#issuecomment-377562450, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszDdjh18_OqV6Wzmblhhpn5HwNOsBks5tjlxagaJpZM4TBy3s .

mrocklin commented 6 years ago

In [3]: client.cluster.scheduler.services['bokeh'] Out[3]: <distributed.bokeh.scheduler.BokehScheduler at 0x7f05d3ce9828>

In [4]: client.cluster.scheduler.services['bokeh'].server Out[4]: <bokeh.server.server.Server at 0x7f05d3ce9c50>

On Fri, Mar 30, 2018 at 12:26 PM, Matthew Rocklin mrocklin@anaconda.com wrote:

use LocalCluster(silence_logs=False)

On Fri, Mar 30, 2018 at 12:24 PM, jakirkham notifications@github.com wrote:

So Docker only shows what gets printed to stdout/stderr, which in the MRE shows nothing. If I run using the same startup mechanism, but with a few more bells and whistles (i.e. Jupyter Notebook) then the logging is just from the Jupyter Notebook.

Is there some way to get access to the Bokeh Client running the dashboard from the Distributed Client somehow?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/1875#issuecomment-377562450, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszDdjh18_OqV6Wzmblhhpn5HwNOsBks5tjlxagaJpZM4TBy3s .

jakirkham commented 6 years ago

Just updating what I have found thus far.

There was a new release of bokeh, 0.12.15, which came out yesterday. Upgrading did not solve the issue.

Inspecting the Bokeh server inside the container, it looks ok. Also, using Requests, am able to load the page inside the container, but not outside the container. Suggesting it is a networking related issue.

If it weren't for the fact that the Jupyter Notebook loads fine outside of the container and dask-drmaa avoids this issue, would blame Docker for it, but that doesn't look like the case.

Edit: Also played with forwarding the Bokeh port to different ports outside the container and had no luck with that either.

mrocklin commented 6 years ago

Does Bokeh emit anything in its logs when you connect and don't get a response back?

You might want to set the logging level of bokeh to info in ~/.dask/config.yaml. I think that by default Dask silences some bokeh warnings.

On Fri, Mar 30, 2018 at 1:58 PM, jakirkham notifications@github.com wrote:

Just updating what I have found thus far.

There was a new release of bokeh, 0.12.15, which came out yesterday. Upgrading did not solve the issue.

Inspecting the Bokeh server inside the container, it looks ok. Also, using Requests, am able to load the page inside the container, but not outside the container. Suggesting it is a networking related issue.

If it weren't for the fact that the Jupyter Notebook loads fine outside of the container and dask-drmaa avoids this issue, would blame Docker for it, but that doesn't look like the case.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/1875#issuecomment-377583397, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszOEYo5wyeImV5FR6s0rbRz1ZrdUAks5tjnJOgaJpZM4TBy3s .

jakirkham commented 6 years ago

Only seeing a Bokeh warning saying that connections from all hosts are permitted. Given the issue encountered, this is a good thing to have affirmed, but doesn't really explain why it isn't working unfortunately.

mrocklin commented 6 years ago

Indeed, my thoughts as well

My next approach would be to try the same thing but with http.server to ensure that there isn't some Docker networking issue

On Fri, Mar 30, 2018 at 2:52 PM, jakirkham notifications@github.com wrote:

Only seeing a Bokeh warning saying that connections from all hosts is permitted. Given the issue encountered, this is a good thing to have affirmed, but doesn't really explain why it isn't working unfortunately.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/1875#issuecomment-377595579, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszB5Cjompv_Dx9_4vbH94zBfp2QPRks5tjn7tgaJpZM4TBy3s .

jakirkham commented 6 years ago

Good idea. Ran python -m http.server 8787 and that loaded fine outside the container. Showed a file listing of the directory the server started in within the container.

Edit: Should add this was run when Distributed and the Bokeh server were not running.

jakirkham commented 6 years ago

Also tried serving up this simple Bokeh example using bokeh serve --port 8787 --show myapp.py and that worked fine as well.

dhirschfeld commented 6 years ago

Is the bokeh dashboard itself trying to communicate (pull data) from outside the container?

jakirkham commented 6 years ago

After PR ( https://github.com/dask/distributed/pull/1934 ), it avoids downloading external resources (e.g. logos and the like). Was this the kind of data that you had in mind or were you thinking of something else?

dhirschfeld commented 6 years ago

That and possibly communicating directly with workers. Just grasping at straws really :|

I'm actually going to be spinning up distributed in containers later this week so I'll at least be able to provide independent verification.

jakirkham commented 6 years ago

Would be eager to hear what you find. :)

jakirkham commented 6 years ago

Alright, have narrowed it down. Here is the difference.

Fail:

import distributed
c = distributed.LocalCluster(ip=None)  # default `ip` argument currently

Success:

import distributed
c = distributed.LocalCluster(ip="")

As the ip outside of the container doesn't match the one inside the container, this makes sense.

Not sure that we should change the behavior, but maybe documenting this particular case would be worthwhile (assuming there are not already docs I missed on this :).