computationalmodelling / nbval

A py.test plugin to validate Jupyter notebooks
Other
442 stars 51 forks source link

nbval and ipyparallel #119

Open FabioLuporini opened 5 years ago

FabioLuporini commented 5 years ago

Hi, I wonder whether it's possible to use nbval for a jupyter notebook that exploits ipyparallel in combination with MPI (mpi4py).

This is the notebook I'm talking about. It's nothing special -- you can stop reading at cell 2

  1. I'm seeing failures when running with nbval, not sure if the fault is mine or what, still to be investigated properly (the error trace is here, starting at around line 300)... so for now you might ignore this...I think... but...
  2. how can/should I use things like #NBVAL_IGNORE_OUTPUT in combination with ipyparallel's magic %%px ? both are supposed to appear at the very top of a cell

Thanks!

takluyver commented 5 years ago

I'm not so sure on the parallel stuff, but the marker comments for nbval can be anywhere in the cell. You can also use cell tags instead of comments: https://nbviewer.jupyter.org/github/computationalmodelling/nbval/blob/0.9.1/docs/source/index.ipynb#Using-tags-instead-of-comments

FabioLuporini commented 5 years ago

thanks. I'll try this and will keep digging. Gimme another couple of days before closing the issue alright? Maybe I can report more

FabioLuporini commented 5 years ago

I'm closing this for now. Thanks!

FabioLuporini commented 5 years ago

Sorry, I feel like I have to reopen this issue because I don't really know how to fix it

I keep seeing this kind of error from random cells:

Input:
%%px --block --group-outputs=engine‌
u.data[0, 1:-1, 1:-1] = 1.
u.data

Traceback:‌
Unexpected output fields from running code: {'stdout'}‌

Sometimes our CI is green, sometimes it's red due to one random cell failing as per above THe traceback is always the same. This happens even in cells which are not supposed to print anything to stdout (e.g.., cells only changing entries in a dictionary)

When does nbval exactly check the output of a cell? is it possible that nbval performs the output check when one process has returned, while the others have not yet? or something along these lines ? I'm really at a loss. At this point, any sort of information would be greatly appreciated.

takluyver commented 5 years ago

nbval checks the output when the cell has finished running. This usually means that the execute_reply message has been sent on the shell channel and an idle status message has been sent on the iopub channel. It doesn't know anything specific about ipyparallel - it sends that cell to the kernel, where ipyparallel processes the %%px cell magic and does whatever it needs to do with that.

I can't see any obvious reason why that cell would behave randomly. But I'm not super familiar with ipyparallel.

FabioLuporini commented 5 years ago

I'm still investigating the issue.

After forking ipyparallel and nbval, I found out that the (randomly) failing cell is getting an unexpected message of type stream from the ipython kernel.

These are the messages received on iopub while processing the failing cell ; the third one is the "unexpected" one.

I have no idea why sometimes this bug appears and sometimes not.

I should add that it seems that always the same cells cause the failure (in common they have that some custom __setitem__ is being executed, see 2nd message in the link above (note that u.data is not a numpy array, but rather a custom subclass))

Also, I can't reproduce this on my local machine (which makes debugging horribly painful); this only appears on our CI system (azure pipelines). I don't know if there's a timing issue somehow

EDIT: I wonder whether this might be relevant...

takluyver commented 5 years ago

That issue does look potentially relevant. "got unknown result" is a message from ipyparallel when it gets a reply to a message ID which is not in self.outstanding:

https://github.com/ipython/ipyparallel/blob/6.2.4/ipyparallel/client/client.py#L766

FabioLuporini commented 5 years ago

yes I saw that. Just can't figure out why it sometimes appears, and sometimes not