executablebooks / MyST-NB

Parse and execute ipynb files in Sphinx
https://myst-nb.readthedocs.io
BSD 3-Clause "New" or "Revised" License
205 stars 83 forks source link

Support for code-cell to read code from a file #274

Open mmcky opened 3 years ago

mmcky commented 3 years ago

Is your feature request related to a problem? Please describe.

Within a large lecture series it can be a pretty common pattern to re-use code in different documents. In the past we have used literalinclude to bring in code from a file. As sphinx parses code eventually in the literal_block method it works pretty well.

The issue with myst_nb/jupyter-book is this no longer works when building in this context as the literalinclude is not brought in until sphinx is called. This means the code won't get executed at the jupyter_cache level as the literalinclude is contained in the markdown block when converting using jupytext. The code will be visible in the end HTML but not executed prior to building with sphinx.

Describe the solution you'd like

A solution to this would be to add :file: option to code-cell directives for including code that is meant to be executed.

```{code-cell} python3
:file: _code/code.py

Typically `file` type attributes are usually arguments in `sphinx` but I think it would be better in this case to retain the simplicity of optional `language` argument. 

The more typical sphinx approach would be

````md
```{code-cell} _code/code.py


**Describe alternatives you've considered**

None -- changing the behaviour of `literalinclude` to be parsed as code is not a viable solution given out `jupytext` works.

**Additional context**

This is currently an issue for migrating the [python.quantecon.org](https://python.quantecon.org) series as `literalincludes` are used pretty extensively in 6 lectures. We could copy the code across these lectures but would prefer not to do that to assist with maintaining the code and consistency across lectures that use the same code. 
mmcky commented 3 years ago

One issue with the proposal of adding :file: as an option (rather than an argument) is that it becomes harder to support the usual options associated with, for example selecting lines etc. from the file, as the relationship is a bit less clear

https://www.sphinx-doc.org/en/1.5/markup/code.html#directive-literalinclude

choldgraf commented 3 years ago

I think a feature like this would certainly be useful - though I suspect it might get some pushback due to the fact that it would break the "notebook <--> source file" parity that we have thus far enforced fairly heavily. So I think it's something we'd need to discuss w/ the team

mmcky commented 3 years ago

Yeah -- not sure how to fully solve the "notebook <--> source file" issue.

One possibility is to keep that in tact (which I think is very important to keep two way roundtrips) and add filters/transform concepts to jupytext so jupytext could:

  1. Convert notebook myst <--> ipynb
  2. Apply filters / transforms to support things like or parsing code-cell with a file reference to bring in files

Alternatively perhaps it would be better to add pre-transforms to myst_nb so jupytext remains the same as a two-way conversion tool and we apply filters and transforms prior to calling jupytext.

mmcky commented 3 years ago

Another alternative would be to enable myst_parser to interpret code-cell with a file reference (or introduce code-file directive) and read the contents of the file directly on parsing.

mmcky commented 3 years ago

@chrisjsewell @choldgraf what do you think of a myst_nb loopback / update option that could update source notebooks once the myst nodes have been parsed by Sphinx and the context is understood for issues like this (and the exercise execution issue).

Perhaps it could work like:

  1. Set option mystnb_update_jupytercache = True
  2. delay usual jupyter-cache run to execute source files
  3. Add a transform to generate a set of source notebooks to unpack any nested code-cell contexts (literalinclude, code-cell read from file feature, exercise execution). Updated notebooks wouldn't update the original sources but some intermediate temp object (such as <sourcedir>/.jupyter-cache/sphinx).
  4. run jupyter-cache -> sphinx -> output pipeline.

That initial run could be pretty useful and would be a very flexible approach. It would also seem relatively easy to debug independently of the usual build pipeline and wouldn't interfere with jupytext.

Note: the loop would only focus on execution. We wouldn't build building notebooks from sphinx.ast

chrisjsewell commented 3 years ago

To put it simply; if the file can not be (fully) run by vanilla Jupyter then it is not a Jupyter Notebook and should not have the .ipynb extension. We already have functionality to "pre-convert" custom files extensions to notebooks (https://myst-nb.readthedocs.io/en/latest/examples/custom-formats.html#custom-notebook-formats), this new format should use that

mmcky commented 3 years ago

To put it simply; if the file can not be (fully) run by vanilla Jupyter then it is not a Jupyter Notebook and should not have the .ipynb extension.

Perhaps we can bring in code from files using %load in a code-cell and leverage an ipython magic.

```{code-cell} ipython
%load <path>/filename.py


I tried this but it doesn't seem that the `notebooks` are executed in the same `<path>` as the content. 
chrisjsewell commented 3 years ago

I tried this but it doesn't seem that the notebooks are executed in the same as the content.

It definitely is, execution_in_temp which is set to False by default was added ages ago (https://github.com/executablebooks/MyST-NB/commit/6346d39babd30aa75277f567d5dd7e20ad6d23fc)

The CWD is the same as the folder of the notebook

mmcky commented 3 years ago

@chrisjsewell what happens though when you're using md as the source. Don't they get converted and saved in _build/jupyter_execute?

Ah I think I know what is happening. The %load will alter the input code cell (not the output of the code-cell) -- and I suspect jupyter-cache isn't setup for that kind of change right? The cell also needs to run twice (1 x to run magic and 1 x to execute code)

mmcky commented 3 years ago

Maybe we can leverage nbconvert preprocessors/filters to enable cell level transformations before executing the notebook -- through jupyter-cache. If I understand the stack correctly -- myst_nb handles the conversion to notebook via jupytext before calling jupyter-cache to execute. Is that right @chrisjsewell ?

If that is true -- then perhaps we can add a preprocessor level -- using nbconvert in myst_nb to achieve this.

choldgraf commented 3 years ago

hmmm - I don't believe that we depend on nbconvert currently, do we? I feel like that'd be a heavy-ish dependency to add given that it also duplicates much of the functionality of jupytext, though I could be wrong there.

@mmcky as a start couldn't you add a step to your build process that would:

  1. Grab some text in a .py file (or whatever)
  2. Search all detected files for cells with a particular tag or other piece of metadata
  3. Update the content of those cells with your shared text
  4. Then move forward with building the site

Then updating the shared content would be a matter of:

  1. Updating the shared file
  2. Running a build
  3. Committing the updated content files to the repo
akhmerov commented 3 years ago

hmmm - I don't believe that we depend on nbconvert currently, do we?

jupyter-sphinx drags it in. There it is used to write the scripts corresponding to the notebooks (determining extension from the kernel, comment syntax, etc). I don't think it's something you want to implement on your own. And also neither does jupytext.

mmcky commented 3 years ago

hmmm - I don't believe that we depend on nbconvert currently, do we?

Oops sorry I meant nbclient -- I think that is now used for execution in jupyter_cache so was thinking if the pre-execution filters concept from nbconvert may work to apply some cell level transformations (i.e. bringing in the contents of a file) prior to calling jupyter-cache but after jupytext conversion to notebook format:

  1. build ipynb notebook with jupytext
  2. use nbconvert to apply some cell level transformations to add support for things like (read code from file, ipython magics such as %load)
  3. trigger remaining build using jupyter-cache for execution etc ....

@choldgraf are you suggesting I use a custom script to implement a copy/paste type of thing to update the intermediate notebooks? I may do that in the short term -- the complicating factor is that the projects are building from md sources so the ipynb layer is generated by myst_nb :-)

choldgraf commented 3 years ago

@mmcky yeah that's what I was suggesting - perhaps it could help inform future development on the MyST-NB side? It doesn't seem like it would be too much work to me but I think it'd benefit from experimentation and prototyping to crystallize what potential solutions would look like

mmcky commented 3 years ago

@choldgraf I looked at how this could be supported during the process of converting md files to ipynb and submitted #286 as a prototype for discussion on how to best address this feature.

schmoelder commented 3 years ago

Hi, I saw a PR was merged so I don't know if this issue is still open for discussion.

If so, I'd be interested to know whether it's also possible to only include certain lines from a file like in Sphinx?

mmcky commented 3 years ago

@schmoelder not at this stage -- as the intention is to include code that executes (via code-cell) so you usually need to bring the whole file in (unless your thinking of brining in a function from a file etc. -- but wouldn't be recommended). If you are just displaying code then you can of course still use the sphinx directives in jupyter-book such as:

```{code-block} ruby
:lineno-start: 10

Some more Ruby code, with line numbering starting at 10.
schmoelder commented 3 years ago

Thanks for the quick reply, that's unfortunate. Actually we really were thinking of bringing functionality from other files in since we would prefer not to maintain many copies of the same code required for setting up simulations in our online tutorials/workshops. But we'll figure something out! :)

Just out of curiosity, why do you consider it a bad idea? And what would be missing to implement it?

Best wishes and thanks for MyST-NB in general!

mmcky commented 3 years ago

thanks @schmoelder my main concern is that code-cell is for bringing in code that needs to be executed from a file. Therefore if you bring in lines from a file -- and then you edit the file -- there is no explicit link between the line numbers and the code that resides in files which will then almost certainly break execution of your book. Also if you edit the code you will need to then find all instances of the specific lines that you have specified in your book

we would prefer not to maintain many copies of the same code required for setting up simulations

This was our use case as well for this feature.

Would either of these approaches work you:

  1. refactor the shared code into a single file that you import? [this is the approach we adopted]
  2. setup a python package (or do you want the setup code to be visible in the book)?

If you have a good use case more than happy to review a PR. This is the current implementation

sparsecoder commented 1 week ago

The current implementation is great for my setup with jupyter-book. Please do not remove like it says might happen in the docs.

Some of my code snippets take a long time to run and it is a pain to compile the whole markdown for each change. Copying the finished product by hand is also not a good option, since it means maintaining two copies or copying the code in the book out every time maintenance is needed. I though about splitting sections into their own md files, which I will do anyway, but with the code outside it can be run and tested with CI tools, which is what I'd like to do. With :load: I can do both.

See also jupyter-book#2200

gibberblot commented 1 week ago

I concur that this functionality is a must-have. I mean, to the point where I would switch to another solution if it disappeared -- just too hard to maintain my code based AND the book pages.