If you start the scheduler and accidentally open the TCP comm port in a browser instead of the dashboard port you get a confusing pickle message in the browser and an exception in the scheduler.
$ dask scheduler
2024-10-24 12:36:30,908 - distributed.scheduler - INFO - -----------------------------------------------
2024-10-24 12:36:31,259 - distributed.scheduler - INFO - State start
2024-10-24 12:36:31,262 - distributed.scheduler - INFO - -----------------------------------------------
2024-10-24 12:36:31,263 - distributed.scheduler - INFO - Scheduler at: tcp://10.51.100.43:8786
2024-10-24 12:36:31,263 - distributed.scheduler - INFO - dashboard at: http://10.51.100.43:8787/status
2024-10-24 12:36:31,263 - distributed.scheduler - INFO - Registering Worker plugin shuffle
2024-10-24 12:36:34,608 - tornado.application - ERROR - Exception in callback functools.partial(<function TCPServer._handle_connection.<locals>.<lambda> at 0x7f26e4194ae0>, <Task finished name='Task-254' coro=<BaseTCPListener._handle_stream() done, defined at /home/jtomlinson/Projects/dask/distributed/distributed/comm/tcp.py:654> exception=MemoryError((6073139484287059271,), dtype('uint8'))>)
Traceback (most recent call last):
File "/home/jtomlinson/miniconda3/envs/dask/lib/python3.11/site-packages/tornado/ioloop.py", line 750, in _run_callback
ret = callback()
^^^^^^^^^^
File "/home/jtomlinson/miniconda3/envs/dask/lib/python3.11/site-packages/tornado/tcpserver.py", line 387, in <lambda>
gen.convert_yielded(future), lambda f: f.result()
^^^^^^^^^^
File "/home/jtomlinson/Projects/dask/distributed/distributed/comm/tcp.py", line 666, in _handle_stream
await self.on_connection(comm)
File "/home/jtomlinson/Projects/dask/distributed/distributed/comm/core.py", line 288, in on_connection
return await super().on_connection(comm, handshake_overrides)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jtomlinson/Projects/dask/distributed/distributed/comm/core.py", line 267, in on_connection
handshake = await comm.read()
^^^^^^^^^^^^^^^^^
File "/home/jtomlinson/Projects/dask/distributed/distributed/comm/tcp.py", line 227, in read
frames_nosplit = await read_bytes_rw(stream, frames_nosplit_nbytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jtomlinson/Projects/dask/distributed/distributed/comm/tcp.py", line 359, in read_bytes_rw
buf = host_array(n)
^^^^^^^^^^^^^
File "/home/jtomlinson/Projects/dask/distributed/distributed/protocol/utils.py", line 29, in host_array
return numpy.empty((n,), dtype="u1").data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 5.27 EiB for an array with shape (6073139484287059271,) and data type uint8
2024-10-24 12:36:34,649 - tornado.application - ERROR - Exception in callback functools.partial(<function TCPServer._handle_connection.<locals>.<lambda> at 0x7f26e4194cc0>, <Task finished name='Task-257' coro=<BaseTCPListener._handle_stream() done, defined at /home/jtomlinson/Projects/dask/distributed/distributed/comm/tcp.py:654> exception=MemoryError((8530211521808319815,), dtype('uint8'))>)
Traceback (most recent call last):
File "/home/jtomlinson/miniconda3/envs/dask/lib/python3.11/site-packages/tornado/ioloop.py", line 750, in _run_callback
ret = callback()
^^^^^^^^^^
File "/home/jtomlinson/miniconda3/envs/dask/lib/python3.11/site-packages/tornado/tcpserver.py", line 387, in <lambda>
gen.convert_yielded(future), lambda f: f.result()
^^^^^^^^^^
File "/home/jtomlinson/Projects/dask/distributed/distributed/comm/tcp.py", line 666, in _handle_stream
await self.on_connection(comm)
File "/home/jtomlinson/Projects/dask/distributed/distributed/comm/core.py", line 288, in on_connection
return await super().on_connection(comm, handshake_overrides)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jtomlinson/Projects/dask/distributed/distributed/comm/core.py", line 267, in on_connection
handshake = await comm.read()
^^^^^^^^^^^^^^^^^
File "/home/jtomlinson/Projects/dask/distributed/distributed/comm/tcp.py", line 227, in read
frames_nosplit = await read_bytes_rw(stream, frames_nosplit_nbytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jtomlinson/Projects/dask/distributed/distributed/comm/tcp.py", line 359, in read_bytes_rw
buf = host_array(n)
^^^^^^^^^^^^^
File "/home/jtomlinson/Projects/dask/distributed/distributed/protocol/utils.py", line 29, in host_array
return numpy.empty((n,), dtype="u1").data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 7.40 EiB for an array with shape (8530211521808319815,) and data type uint8
This is sort of expected because you shouldn't open that port in a browser. But it would be nice if things failed in a better way.
It would be interesting to see if we can detect an HTTP connection and behave differently. For example we could try and show a better error in the browser and avoid going down the code path that raises the exception in the scheduler.
If you start the scheduler and accidentally open the TCP comm port in a browser instead of the dashboard port you get a confusing pickle message in the browser and an exception in the scheduler.
This is sort of expected because you shouldn't open that port in a browser. But it would be nice if things failed in a better way.
It would be interesting to see if we can detect an HTTP connection and behave differently. For example we could try and show a better error in the browser and avoid going down the code path that raises the exception in the scheduler.