Closed GoogleCodeExporter closed 9 years ago
Line aware source doesn't support lines(!) -
http://groups.google.com/group/lepl/browse_thread/thread/a5a813d10f979e14
This looks like a no-brainer - the data until the next EOL should be passed to
the
Pything regexp.
Original comment by acooke....@gmail.com
on 23 Nov 2009 at 11:48
User-defined regexps for line-aware parsing should automatically exclude
[^$\n\r]
from "." (and from implicit open ranges via [^....]).
Original comment by acooke....@gmail.com
on 23 Nov 2009 at 11:50
Better syntax for ^ and $. These are similar-but-not-quite-identical-to the
standard
regexp notation. I think it's best to dump them and go with lepl-specific
syntax. A
possibility is (*....) which is similar to Python's (?....). This could also
be used
for labelling, and could be parsed, maing the str methods for regexp classes
self-consistent.
Original comment by acooke....@gmail.com
on 23 Nov 2009 at 11:52
Eos (Eof) should be considered EOL - a line should end at the end of the
file/input,
even if there's no newline (or whatever is currently used).
Original comment by acooke....@gmail.com
on 23 Nov 2009 at 11:53
Possible bug - seems to be something odd about "*" in this post -
http://groups.google.com/group/lepl/msg/15b78d0191d5f5b5?dmode=source
Original comment by acooke....@gmail.com
on 23 Nov 2009 at 11:54
I am starting to think this may be quite difficult (it seems to amount to
emulating a
broken legacy implementation of regexps!), but it would be nice if we supported
Perl/Python's non-greedy alternatives in regexps.
For example (a|ac)c applied to to "acc" should match "ac".
Need to be careful here - I assume this means some matches with nested
alternatives
will fail. Check exactly how Python/Perl behave.
Original comment by acooke....@gmail.com
on 23 Nov 2009 at 11:57
Support for non-token line-aware parsing. There's an example in the docs, but
it
won't work with Extend (ie across lines). This may not be reasonably possible,
in
which case look for alternative support (eg matching line break explicitly).
Original comment by acooke....@gmail.com
on 23 Nov 2009 at 9:26
Line-aware parsing + Empty() bug:
-------------------------------------------------------
from lepl import *
introduce = ~Token(':')
word = Token(Word(Lower()))
statement = Delayed()
simple = BLine(word[:])
empty = BLine(Empty()
block = BLine(word[:] & introduce) & Block(statement[:])
statement += (simple | empty | block) > list
parser = statement[:].string_parser(LineAwareConfiguration(block_policy=2))
result = parser('worda\nwordb:\n wordc:\n wordd')
as expected, we got [[u'worda'], [u'wordb', [u'wordc', [u'wordd']]]]
but
result = parser('worda\nwordb:\n\n wordc:\n wordd')
returns unexpected [[u'worda'], [u'wordb'], []]
Original comment by aachu...@gmail.com
on 27 Nov 2009 at 3:06
More info on the above (Empty()).
This is actually normal behaviour. What's happening is that blocks do not
continue
over empty lines. So the input data do not match the grammar. If the lines
after
the blank line had no space to the left, then they should match (as a new, zero
indented block).
However, we do clearly need some way to include blank lines in blocks - this
was also
raised on the mailing list. In fact we probably want to ba able to support
three
different cases:
- An empty line means you must start again at the left (as now)
- An empty line means that you continue with the current indent (in this case, how
do you end a block?)
- Both the above, depending on context (ie choose which ever fits the indent of the
line after the blank)
And related to this, what about comment blocks that might have an arbitrary
indent?
Original comment by acooke....@gmail.com
on 27 Nov 2009 at 3:26
More on the above - it may simply be a case of documenting how to use Line()
rather
than BLine() (see Andrey's email around 28 Nov).
Original comment by acooke....@gmail.com
on 28 Nov 2009 at 1:30
OK, I've fixed the majority of these in 3.3.3.
What I haven't done is (1) emulate the non-greedy choice in regexp or (2)
provide a
better way to do offside parsing without tokens (I do now warn more clearly in
the
manual that tokens are necessary).
Original comment by acooke....@gmail.com
on 10 Dec 2009 at 12:17
Original issue reported on code.google.com by
acooke....@gmail.com
on 23 Nov 2009 at 11:46