cython / cython

The most widely used Python to C compiler
https://cython.org
Apache License 2.0
9.43k stars 1.48k forks source link

[BUG] Using const in templated expression #4180

Open isVoid opened 3 years ago

isVoid commented 3 years ago

Describe the bug Currently using const in templated cdef statement works. However, the parser complains when const is used within a templated expression. Related issue and PR: #3885, #3886

To Reproduce Code to reproduce the behaviour:

# Mock up templated class V
cdef cppclass V[T]:
    V()
cdef V[const int] v
v = V[const int]()
Error compiling Cython file:
------------------------------------------------------------
...
cdef cppclass V[T]:
    V()
cdef V[const int] v
v = V[const int]()
           ^
------------------------------------------------------------

/raid/wangm/dev/rapids/.cache/ipython/cython/_cython_magic_57140422ec6affad49b6866a3bff4f72.pyx:4:12: Expected ']', found 'int'

Expected behavior This should compile into

// Some mocked up templated class V
V<const int> v;
v = V<const int>();

Environment (please complete the following information):

vyasr commented 11 months ago

@da-woods @scoder I've done a bit of tracing here and have narrowed down the problem, but I could use some help determining the appropriate fix. The error is coming from https://github.com/cython/cython/blob/master/Cython/Compiler/Parsing.py#L619, which is happening because it's parsing the const part of the statement as something that should stand on its own within an indexing statement [...]. The p_subscript_list call 13 lines above that is returning const as the only token, whereas in this case we need multiple tokens here. However, I'm not sure where the right place to fix this is. p_subscript_list and the subsequent p_subscript/p_slice_element calls seem pretty specifically tailored to Python slicing syntax. Does it make sense to have those be context-aware enough to know when they're inside a C++ template declaration and to allow spaces within the "slice" in order to support multi-token declarations?

da-woods commented 11 months ago

Does it make sense to have those be context-aware enough to know when they're inside a C++ template declaration and to allow spaces within the "slice" in order to support multi-token declarations?

I don't realistically think they can be context-aware enough.

Taking the example above:

v = V[const int]()

this line is treated by the parser as an isolated expression, and so it really can't have any information that distinguishes it from indexing a Python variable then calling the result. The type of V won't be known until the type-analysis stage significantly later.

I think realistically we'd have to significantly loosen what the parser accepts (probably only in Cython mode and not in pure-Python mode), and then reject invalid uses later, probably in analyse_expressions. I suspect rejecting invalid uses would actually already happen so probably doesn't need doing (but might need testing).

vyasr commented 11 months ago

That makes sense, I had some misgivings on that front too. I'm not sure I see how the relaxation you envision would work, though. Would it be a matter of changing the s.expect call I linked to be conditional in a way that allows for some alternatives? Or are you envisioning a completely different solution elsewhere in the parser since as you say this function may not be context-aware enough? If so, I suspect I don't know enough about Cython's internals myself to address this and would probably need substantial extra guidance to put together a solution.

da-woods commented 11 months ago

There's (at least) a couple of options:

  1. put it in p_subscript. That involves replacing p_test (which is essentially "a Python expression") with something that handles. That'd be the "LL"-style solution that Python was based around in the past. The idea here is that by the time you get to your expect(']') you really do need to have parsed the full subscript list so that's definitely too high up to parse it.
  2. Attempt a more PEG-style solution (i.e. how the more modern Python parser work). In this version you first try parsing the index with a subscript list. If that fails you rewind, and try parsing a "type list" instead. For this you use with tentatively_scan which automatically rewinds if it encounters and error, so you can try the next possibility. In this case I'd put the with tentatively_scan in p_index I think.

The advantage of option 2 is that it lets you leave the existing index parsing code quite clean (i.e. just Python code), but add a special-case fallback that it tries after.

The disadvantages are that rewinding isn't that cheap, and you can end up with "catastrophic backtracking" type situations if you have to reparse the same code multiple times to try different options. It also isn't the style that most of the Cython parser is written it, but it's something I've been using the cover the newer syntax which uses the PEG parser in CPython.

If I were doing it I'd probably use option 2. But that doesn't mean it's the right option :)