NiklasRosenstein / python-docspec

Docspec is a JSON object specification for representing API documentation of programming languages.
https://niklasrosenstein.github.io/python-docspec/
Other
15 stars 5 forks source link

Cannot parse code with "match / case" #79

Closed rabernat closed 1 year ago

rabernat commented 1 year ago

Thanks for maintaining this fantastic project! :pray: We are using it to integrate our python API docs with a Docusaurus site.

Describe the bug Python 3.10 introduced structural pattern matching with match / case syntax. I have found that mydoc markdown cannot parse code with this syntax. I am filing the bug report here rather than in pydoc-markdown because the stack trace indicates that the error comes from docspec_python

To Reproduce Steps to reproduce the behavior:

Create the following python module

def function_with_match():
    """A function that can't be parsed with pydoc-markdown."""

    foo = "a"
    match foo:
        case "a":
            pass

Create a pydoc-markdown configuration to parse it. Mine looks like this

loaders:
  - type: python
    search_path: [../pydoc-markdown-bug]
processors:
  - type: filter
    skip_empty_modules: true
  - type: smart
  - type: crossref
renderer:
  type: docusaurus
  docs_base_path: docs
  relative_output_path: reference
  relative_sidebar_path: sidebar.json
  sidebar_top_level_label: 'Reference'

Then run pydoc-markdown. My stack trace looks like this

Traceback (most recent call last):
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/docspec_python/parser.py", line 88, in parse_to_ast
    return RefactoringTool([], options).refactor_string(code + '\n', filename)
  File "/Users/rabernat/mambaforge/lib/python3.10/lib2to3/refactor.py", line 364, in refactor_string
    self.log_error("Can't parse %s: %s: %s",
  File "/Users/rabernat/mambaforge/lib/python3.10/lib2to3/refactor.py", line 362, in refactor_string
    tree = self.driver.parse_string(data)
  File "/Users/rabernat/mambaforge/lib/python3.10/lib2to3/pgen2/driver.py", line 103, in parse_string
    return self.parse_tokens(tokens, debug)
  File "/Users/rabernat/mambaforge/lib/python3.10/lib2to3/pgen2/driver.py", line 71, in parse_tokens
    if p.addtoken(type, value, (prefix, start)):
  File "/Users/rabernat/mambaforge/lib/python3.10/lib2to3/pgen2/parse.py", line 162, in addtoken
    raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=1, value='foo', context=(' ', (5, 10))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/bin/pydoc-markdown", line 8, in <module>
    sys.exit(cli())
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/pydoc_markdown/main.py", line 344, in cli
    session.render(pydocmd)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/pydoc_markdown/main.py", line 136, in render
    modules = config.load_modules()
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/pydoc_markdown/__init__.py", line 154, in load_modules
    modules.extend(loader.load())
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/docspec_python/__init__.py", line 87, in load_python_modules
    yield parse_python_module(filename, module_name=module_name, options=options, encoding=encoding)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/docspec_python/__init__.py", line 128, in parse_python_module
    return parse_python_module(fpobj, fp, module_name, options, encoding)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/docspec_python/__init__.py", line 132, in parse_python_module
    ast = parser.parse_to_ast(fp.read(), filename)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/docspec_python/parser.py", line 90, in parse_to_ast
    raise ParseError(exc.msg, exc.type, exc.value, tuple(exc.context) + (filename,))
lib2to3.pgen2.parse.ParseError: bad input: type=1, value='foo', context=(' ', (5, 10), '/Users/rabernat/gh/earth-mover/pydoc-markdown-bug/match_bug.py')

Expected behavior Given that docspec-python supports Python >=3.7, I would expect it to be able to parse all valid python 3.10 syntax.

Versions pydoc-markdown, version 4.6.3 docspec-python, version 2.0.2

NiklasRosenstein commented 1 year ago

Hi @rabernat, thanks for raising this issue! Unfortunately I won't have time to look into this in the short term, but I'd be happy to accept a PR if you're up for it.

NiklasRosenstein commented 1 year ago

Side note: I'm not sure if lib2to3 was updated to support match/case.

nrser commented 1 year ago

Looks like lib2to3 is incapable of parsing match (due to it being LL(1)) and is deprecated / scheduled to be removed from the language:

https://docs.python.org/3.11/library/2to3.html#module-lib2to3

These packages are recommended as alternatives:

  1. LibCST
  2. parso

I'm taking a look at it now as I just ran into this issue and getting rid of match isn't really an option. No promises I'll get anywhere with it but if I do I'll share the code.

nrser commented 1 year ago

@rabernat @NiklasRosenstein This passes the docspec-python tests, including a new one for the match statement:

https://github.com/nrser/docspec/tree/blib2to3

It's a single commit:

https://github.com/nrser/docspec/commit/1a08d2a94aae13253f3c5b0abed8829472c1c49d

It seems the black folks have their own fork/extension of lib2to3 called blib2to3 that is bundled with the black package. They managed to get it to parse at least some amount of match forms.

I added black as a dependency and swapped blib2to3 in. This is totally a "quick fix", and I have no idea how well it will work, but I wanted to share it now in case I never end up getting any further with it.

nrser commented 1 year ago

Just a heads up, tried that code on source from an actual project and there are a bunch of issues. Looks like relatively minor stuff involving the AST being slightly different, but it's gonna take some time to grind through.

NiklasRosenstein commented 1 year ago

Merged #80, thanks @nrser!