Open amcgregor opened 10 years ago
Thanks for contributing! Is the primary motivation here performance?
I'm not sure it's possible to make this compatible with the RPython version, so that will require some thinking before this can land.
I've been investigating the translation failure; it was my understanding that container types (lists, tuples, etc.) must be type-homogeneous internally (groups are always tuples of strings) and that None was an allowed exception to this (as there may be no groups, None is a possible value instead of a tuple). I may be missing something, though, as I'm very new to RPython.
The primary motivator is de-duplication of work. The regex during the tokenization step will already be capturing the groups, but the tokenizer just throws that information away. In the string parsing example the key elements needed (Python-style string flags and the contents of the quoted string) would need to be re-extracted in the parser.
lists need to be homogenous internally, tuples are allowed to be heterogenous, but must always be the same length (and have the same types at the same positions). If they're all strings, maybe using lists here makes sense?
On Mon, May 5, 2014 at 10:32 AM, Alice Zoë Bevan–McGregor < notifications@github.com> wrote:
I've been investigating the translation failure; it was my understanding that container types (lists, tuples, etc.) must be type-homogeneous internally (groups are always tuples of strings) and that None was an allowed exception to this (as there may be no groups, None is a possible value instead of a tuple). I may be missing something, though, as I'm very new to RPython.
— Reply to this email directly or view it on GitHubhttps://github.com/alex/rply/pull/27#issuecomment-42214051 .
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero GPG Key fingerprint: 125F 5C67 DFE9 4084
Indeed, lists would make more sense now that I know tuples are even weirder than I expected. ;) Let me patch and see if this fixes the test failure locally.
So, I've converted the regex group storage to using a list, however this has not corrected the somewhat mystifying translation error I'm getting:
E AnnotatorError:
E
E signature mismatch: __init__() takes exactly 4 arguments (3 given)
E
E
E Occurred processing the following simple_call:
E (AttributeError getting at the binding!)
E v3 = simple_call(v0, v1, v2)
E
E In <FunctionGraph of (rply.lexer:34)LexerStream.next at 0x10a3cb088>:
E Happened at file /Users/amcgregor/Documents/Clueless/tmp/rply/rply/lexer.py line 43
E
E ==> match = rule.matches(self.s, self.idx)
E if match:
E
E Known variable annotations:
E v0 = SomeBuiltin(analyser=<rpython.tool.descriptor.InstanceMethod object at 0x000000010a4b44f0>, methodname='matches', s_self=SomeRule())
E v1 = SomeString(no_nul=True)
E v2 = SomeInteger(const=0, knowntype=int, nonneg=True, unsigned=False)
rule.matches()
isn't an __init__
call. :/
I think the answer is that the code in the if rpython
section at the bottom of lexergenerator.py
needs to be expanded to add the additional details on Matches. I haven't investigated what exactly needs adding though (I'm travelling ATM, will be more available from wednesday on).
As certain token constructs represent elements being wrapped—such as text being wrapped in enclosing quotes—the parser step would need to pre-process the token to remove the quotes and identify flags (in the case for Python-style prefixed strings anyway. Why do the work twice?
The attached changes add slots and update
__repr__
implementations where needed, and include a test for the "quoted string" case, demonstrating use. Documentation is also updated to clearly demonstrate the "quoted string" use case and update the presented objectrepr
output.