fastai / nbdev

Create delightful software with Jupyter Notebooks
https://nbdev.fast.ai/
Apache License 2.0
4.9k stars 486 forks source link

Error building _modidx.py #1153

Open civvic opened 1 year ago

civvic commented 1 year ago

If the library folder contains modules extraneous to nb_dev, building _modidx.py will fail in some circumstances. In particular, parsing files generated by VSCode export to python script (@command:jupyter.exportAsPythonScript) and probably those generated by Jupytext

Explanation

Looking at the code for building _modidx.py, it seems the intent is to only parse nbdev controlled modules:

def _iter_py_cells(p):
    "Yield cells from an exported Python file."
    p = Path(p)
    cells = p.read_text().split("\n# %% ")
    for cell in cells[1:]:
        top,code = cell.split('\n', 1)
        nb,idx = top.split()
        nb_path = None if nb=='auto' else (p.parent/nb).resolve()  # NB paths are stored relative to .py file
        if code.endswith('\n'): code=code[:-1]
        yield AttrDict(nb=nb, idx=int(idx), code=code, nb_path=nb_path, py_path=p.resolve())

But the parsing of the cell header can easily fail if it encounters cell tags not generated by nbdev, i.e., # %% [markdown], as seen below.

Details

Recreate the issue

❯ nbdev_export
Traceback (most recent call last):
  File "/Users/vic/mambaforge/envs/TBOs/bin/nbdev_export", line 33, in <module>
    sys.exit(load_entry_point('nbdev', 'console_scripts', 'nbdev_export')())
  File "/Users/vic/dev/repo/source/fastai/fastcore/fastcore/script.py", line 119, in _f
    return tfunc(**merge(args, args_from_prog(func, xtra)))
  File "/Users/vic/dev/repo/source/fastai/nbdev/nbdev/doclinks.py", line 137, in nbdev_export
    _build_modidx()
  File "/Users/vic/dev/repo/source/fastai/nbdev/nbdev/doclinks.py", line 99, in _build_modidx
    res['syms'].update(_get_modidx((dest.parent/file).resolve(), code_root, nbs_path=nbs_path))
  File "/Users/vic/dev/repo/source/fastai/nbdev/nbdev/doclinks.py", line 70, in _get_modidx
    for cell in _iter_py_cells(py_path):
  File "/Users/vic/dev/repo/source/fastai/nbdev/nbdev/doclinks.py", line 53, in _iter_py_cells
    nb,idx = top.split()
ValueError: not enough values to unpack (expected 2, got 1)

or simply add a file with these content to lib folder:

# %%
import sys

# %% [markdown]
# `Stage` corresponds to one node.

# %%
print(sys.path)

Workaround

I've quickly patched maker.ipynb/py and doclinks.ipynb/py to filter out files without headers generated by nbdev (# AUTOGENERATED! DO NOT EDIT! File to edit:...).

My understanding is _modidx.py is mainly used for doc generation, bu I'm not sure, so probably it would be a good idea to strengthen the parser to extract symbols from modules other than those controlled by nb_dev. But in the meantime, I want to continue exploring the new nbdev (I've used v1 a lot in mixed environment with very few problems)

dleen commented 1 year ago

I had made a change (unmerged) to use a more specific regex for handling a similar issue here: https://github.com/fastai/nbdev/pull/928#discussion_r962361512 in case it is useful for trying to handle more of these edge cases