emustudio / edigen

Emulator Disassembler Generator
GNU General Public License v3.0
4 stars 0 forks source link

Generate return-style code instead of default branches #10

Closed sulir closed 3 years ago

sulir commented 12 years ago

There exists an another possible style of generated decoder code, which uses the return statement after a variant is successfully recognized and throws an exception at the end of each rule. The default branches are not used at all.

This style should allow less strict ambiguity detection and the performance could be improved. However, it would bring some new issues which must be concerned, e.g. the instruction length recognition would be more difficult.

vbmacher commented 3 years ago

@sulir if you remember this ticket, what did you mean by "this style should allow less strict ambiguity detection"? Thanks

sulir commented 3 years ago

I cannot remember exactly, but probably it is related to cases such as this one:

rule = "a": 0000 |
       "b": 00 subrule(2);

The recognition would be performed in the order defined in the file, so if some variant matches, the generated return statement is executed and the rest of the variants is not tested.

But the approach may have some shortcomings, and maybe I am confounding this with some other idea, it was a long time ago.

vbmacher commented 3 years ago

Hm, this actually reminds me the problem I explained in #34 . E.g., the following example:

instruction =  "JMP": line(5)     ignore8(8) 000 ignore16(16) |
               "JPR": line(5)     ignore8(8) 100 ignore16(16) |
               "DATA": data(32);

will fail with ambiguity detected, even if I want not to fail. My expectation (and the feature proposal in #34) is that it would try the instructions and only if those are not matched, then the last "branch" - "DATA" will be matched.

One solution as explained in #34 is to support multiple root rules, which will allow kind-of mixing the current style with "return-style" of the root rules:

root instruction, data;

instruction =  "JMP": line(5)     ignore8(8) 000 ignore16(16) |
               "JPR": line(5)     ignore8(8) 100 ignore16(16);

data = "DATA": data(32);

but on the other hand it is a grammar change. I actually like that the current style detects the ambiguity because I think in most of the cases it's the mistake of a programmer (I've run into many such issues and it was always my error). So I think it would be shame if we lost this capability by expanding the return-style in general.

I guess the proposal in #34 solves this best as far as I can see.

vbmacher commented 3 years ago

Therefore I propose to close this ticket...

sulir commented 3 years ago

OK.