notebook2script saves based on data on filesystem, not jupyter content

andaag commented 4 years ago

So this issue is a bit hard to describe, but very logical when you look into it.

To demonstrate create the following new notebook

# Cell
#export
print("Test")

# Cell
from nbdev.export import notebook2script
notebook2script()

If you execute this from top to bottom, it won't export what would be expected. The reason is due to how jupyter saves files in checkpoints/to disk. notebook2script will work on the ipynb file on disk, but that's only saved by default once every 120 seconds.

This makes a lot of sense once you think about it, but it's not naturally logical that when working in a code window. When working in the code window you normally don't have to care what of your file has been buffered to disk and what hasn't.

I had a few issues with this, where I was looking at the .py files on disk to figure out why my code was failing, even though I was sure I fixed the bug is was failing on and ran the export (in many cases using nbdev_build). Then I wrote this https://github.com/fastai/nbdev/pull/50 in order to not forget - and realized it didn't fix the issue.

Some ideas on how to solve this:

Modify the autosave interval. With notebooks you can do that, but its not yet supported in jupyterlab. - https://github.com/jupyterlab/jupyterlab/issues/7083 (triggering the same javascript in jupyter lab does not work...)
Trigger the save function through javascript. This works, but it's hacky. And to make it considerably more hacky triggering javascript like this is async, which means we have to wait until the disk is updated to actually trigger the notebook2script... It might also need different methods for notebook vs jupyter lab.
Make a jupyter plugin instead of %magic command. That can trigger on post_save. Then offer an installer like nbdev_install_git_hooks and tell people we auto export after ctrl+s. I'm not sure if this is trivial to do with notebooks as well as jupyter lab.

These are the only viable ideas I've found so far... I like #3 the best, but I think a warning in documentation / somewhere is needed to avoid this issue.. Hell maybe nbdev_build should check the .ipynb_checkpoints dir to see if there is temp files in there to warn the user that they are about to export something that isn't saved to disk first. It doesn't feel like a clean/elegant solution though...

sgugger commented 4 years ago

I'm not sure this is an issue. Yes, you have to remember to save to disk before exporting, the same way you have to remember to save your module before you can use your library. Maybe we can document it better and add a big flag in the cell that exports in the template, but it's not unnatural and you quickly get the hang of it.

Auto-triggering the export on a save would be dangerous since you may not want to export a bad state while your work is in progress and still want to save your work.

If there was a simple way to do this, I would love to have something that magically works, but the solutions you propose are very hacky or a bit dangerous IMO. Issuing a warning after checking the .ipynb_checkpoints seem like the best way to me.

andaag commented 4 years ago

I'm not sure this is an issue. Yes, you have to remember to save to disk before exporting, the same way you have to remember to save your module before you can use your library. Maybe we can document it better and add a big flag in the cell that exports in the template, but it's not unnatural and you quickly get the hang of it.

I would argue it is unnatural. As a developer when you think about it its extremely understandable, but in the context of a notebook saving is not something you need to think about. So it makes sense when using nbdev_build - but not when using notebook2script().

Auto-triggering the export on a save would be dangerous since you may not want to export a bad state while your work is in progress and still want to save your work.

Is that really a case? I think this depends a bit on how you view the generated files... The way I look at this is the notebook and the attached .py module is coupled. If my notebook is broken I'd expect the associated .py file to be broken too. I don't see that many cases where I'd be working in a notebook and have it broken, but still in parallel with that would use the .py file. For that I'd use branches.

If there was a simple way to do this, I would love to have something that magically works, but the solutions you propose are very hacky or a bit dangerous IMO. Issuing a warning after checking the .ipynb_checkpoints seem like the best way to me.

Issuing a warning if .ipynb_checkpoints has data seems like a fairly safe thing to do (depending on how standardized that directory is, and how easy it is to parse, I haven't looked into that).

I think also approach #3 above is fairly straightforward. However, it does mean if you save, you also save the .py file. To me this is fairly logical, but it depends on how you work and what your expectations are.

jph00 commented 4 years ago

Personally I wouldn't expect exporting to work without saving a notebook. All our tools as developers work on saved files; I wouldn't want nbdev to have some magic that changes this. I also wouldn't want exporting to happen automatically when I save - I save frequently, and export rarely.

The answer is just to press s before you run that cell. :)

durapensa commented 3 years ago

This unexpected behaviour bit me too, I went looking for an answer and found this thread. At the very least issuing a warning if .ipynb_checkpoints has data would save a lot of hassle, even if it's just in the case of a forgotten ^s.

fastai / nbdev

notebook2script saves based on data on filesystem, not jupyter content #51