ReviewNB / treon

Easy to use test framework for Jupyter Notebooks
https://reviewnb.com
MIT License
305 stars 29 forks source link

Treon stops running with multiple threads #15

Closed asteppke closed 4 years ago

asteppke commented 5 years ago

When running treon on Windows 10 with multiple threads it sometimes stops running because of issues with the underlying jupyter client.

To some extent this is an issue with the jupyter client, and or nbconvert but treon is triggering the issue by calling nbconvert in multiple threads.

The error message and discussion of the jupyter client is at this issue: https://github.com/jupyter/jupyter_client/issues/466

For treon a workaround though would be to use multiple processes instead of threads. Ipython does not seem to be thread-safe as of now but this is being worked on (https://github.com/jupyter/nbconvert/issues/936).

amit1rrr commented 5 years ago

@asteppke Thanks for reporting. From this comment it looks like the issue is resolved in nbconvert 5.6.0. Shouldn't our problem be solved by upgrading to that version?

Also, what's the exact failure beahviour you are seeing? Is it reproducible? If yes, please share the steps so we can try it out.

asteppke commented 5 years ago

When starting the current version of treon on a system with the following recent jupyter environment (Windows 10):

jupyter core     : 4.5.0
jupyter-notebook : 6.0.0
qtconsole        : 4.5.4
ipython          : 7.8.0
ipykernel        : 5.1.2
jupyter client   : 5.3.1
jupyter lab      : 1.0.2
nbconvert        : 5.5.0
ipywidgets       : 7.5.1
nbformat         : 4.4.0
traitlets        : 4.3.2

Then I receive the following error messages:

ERROR in testing F:\notebooks\test1.ipynb
            Traceback (most recent call last):
  File "c:\users\alexander\anaconda3\lib\site-packages\treon\task.py", line 23, in run_tests
    self.is_successful, console_output = execute_notebook(self.file_path)
  File "c:\users\alexander\anaconda3\lib\site-packages\treon\test_execution.py", line 11, in execute_notebook
    ep.preprocess(notebook, {'metadata': {'path': '.'}})
  File "c:\users\alexander\anaconda3\lib\site-packages\nbconvert\preprocessors\execute.py", line 379, in preprocess
    with self.setup_preprocessor(nb, resources, km=km):
  File "c:\users\alexander\anaconda3\lib\contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "c:\users\alexander\anaconda3\lib\site-packages\nbconvert\preprocessors\execute.py", line 324, in setup_preprocessor
    self.km, self.kc = self.start_new_kernel(cwd=path)
  File "c:\users\alexander\anaconda3\lib\site-packages\nbconvert\preprocessors\execute.py", line 271, 
in start_new_kernel
    km.start_kernel(extra_arguments=self.extra_arguments, **kwargs)
  File "c:\users\alexander\anaconda3\lib\site-packages\jupyter_client\manager.py", line 236, in start_kernel
    "Currently valid addresses are: %s" % (self.ip, local_ips())
RuntimeError: Can only launch a kernel on a local interface. This one is not: 127.0.0.1.Make sure that the '*_address' attributes are configured properly. Currently valid addresses are: ['192.168.56.1', '192.168.1.108', '0.0.0.0', '']

The directory where treon is run contains several notebooks. Under the same conditions with the --threads=1 option the error message disappears.

This is related to the jupyter client, it looks like a race condition in the module determining the local ip addresses.

It is reproducible on several computers. I assume the key ingredients are a Windows system, at least one network interface besides localhost, and several notebooks which treon wants to run in parallel.

amit1rrr commented 4 years ago

RuntimeError: Can only launch a kernel on a local interface. This one is not: 127.0.0.1.Make sure that the '*_address' attributes are configured properly. Currently valid addresses are: ['192.168.56.1', '192.168.1.108', '0.0.0.0', '']

What's your local_hostnames & allow_remote_access config values? Can you try setting allow_remote_access to false and adding 127.0.0.1 to local_hostnames list?

Reference: https://jupyter-notebook.readthedocs.io/en/stable/config.html

asteppke commented 4 years ago

Generating a configuration and setting allow_remote_access to False, and adding 127.0.0.1 to local_hostnames does not change the outcome. The error messages remains the same:

  File "c:\miniconda3\envs\vti\lib\site-packages\jupyter_client\manager.py", line 236, in start_kernel
    "Currently valid addresses are: %s" % (self.ip, local_ips())

RuntimeError: Can only launch a kernel on a local interface. This one is not: 127.0.0.1. 
Make sure that the '*_address' attributes are configured properly. 
Currently valid addresses are: ['192.168.0.110', '192.168.183.225', '0.0.0.0', '']

It seems that the nbconvert mechanic is ignoring this configuration variable.

What seems to be the underlying issue is that treon uses the nbconvert library, which then uses manager.py from jupyter_client which then uses localinterfaces.py which on Windows operating systems uses the ipconfig command to fill a singleton-like data structure. The last step is not thread-safe, that results in the first treon thread to receive the correct output including the 127.0.0.1 address, in the second thread this fails and then we get the error message from above.

So I think a workaround would be to call nbconvert in a different process or get an upstream patch into jupyter_client to make the local ip lookup thread-safe.

amit1rrr commented 4 years ago

This PR in jupyter_client fixes this issue.