delph-in / pydelphin

Python libraries for DELPH-IN
https://pydelphin.readthedocs.io/
MIT License
79 stars 27 forks source link

Minor issues with REPP #308

Closed goodmami closed 2 years ago

goodmami commented 4 years ago

There are some parts of the REPP spec (spread across ReppTop wiki and various publications) that are not completely clear. PyDelphin currently makes guesses about the proper behavior, but it would be good to confirm them with others.

  1. Must iterative groups be called within the module that defines them? (currently yes)
  2. Can iterative group calls occur before the group's definition? (currently no, but maybe it should be allowed)
  3. Must the tokenizer pattern appear in the top-level module (or included file)? Can it be defined in an external module instead? (currently no)
  4. Is it an error to have tokenization patterns or meta-info declarations inside iterative groups? (currently no, probably should be yes)
oepen commented 4 years ago

thanks for the clarification request, mike! i have tried to answer your questions in what i consider the official specification:

http://moin.delph-in.net/ReppTop?action=diff&rev2=44&rev1=43

do those changes seem sufficient to clarify your points?

goodmami commented 4 years ago

Yes, my questions are answered and I like the answers. Thanks! On Aug 12, 2020, 6:17 PM +0800, Stephan Oepen notifications@github.com, wrote:

thanks for the clarification request, mike! i have tried to answer your questions in what i consider the official specification: http://moin.delph-in.net/ReppTop?action=diff&rev2=44&rev1=43 do those changes seem sufficient to clarify your points? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

goodmami commented 4 years ago

@oepen I'm working to make my code consistent with the updated spec, and it occurs to me there's another set of issues regarding iterative groups (which I'll call "internal" groups to be consistent with the wiki text). Extending my numbering from above...

  1. Internal group definitions (considered atomically) are non-sequential; if the contents of internal groups must be sequential, then nesting of internal groups should not be allowed. By what logic should we allow nesting while disallowing meta-info/tokenizer definitions? (Note that I'm referring to the definitions, not the calls)
  2. Assuming we allow nesting, then does the module-wide identifier space for internal groups persists inside internal groups, too? That is, both of these are illegal?

    #1
    ; ...
    #
    #1
    ; ...
    #

    and

    #1
    #1
    ; ...
    #
    #
  3. If the answer to 6 is "yes", then can a nested internal group be called outside its parent group? E.g.:
    #1
    #2
    ; ...
    #
    #
    >2

I think the cleanest way to resolve these issues is to disallow nested internal group definitions while allowing nested calls. The problem is a potential break in backward compatibility.

Finally, I want to resolve some ambiguity around my use of "must" in (1) above. I think it is fine to define an internal group that is never called (like an inactive external group), but any internal group call must resolve to a group defined within the same module. Your text on the wiki is less ambiguous here.

oepen commented 4 years ago

hi @goodmami, and thanks for pushing further! i think i have answered all of your follow-up question by adding at the end of the section on internal group to the REPP wiki page. in sum, i fail to see why you lean towards outlawing nested internal groups? that, to me (just now, at least), would seem like an unnecessary constraint.

goodmami commented 4 years ago

Thanks! I see that (6) and (7) have answers, but not (5), although the question in (5) is somewhat more philosophical than practical.

i fail to see why you lean towards outlawing nested internal groups?

I'm not trying to outlaw nested internal groups. I'm just noticing that our specification would be more consistent if we outlawed nested internal group definitions. The definitions are non-sequential while the internal group calls are sequential, so disallowing nested definitions would be consistent with this passage:

Owing to their non-sequential status, the tokenizer (:) and version (@) operators cannot occur inside a numbered internal group.

Disallowing them also removes the question about whether there's a separate namespace inside internal groups (meaning the clarification text you added about the global namespace would become unnecessary).

Regarding the wiki text here:

In principle, it is possible to have an internal group nested inside another one(which could be useful, for example, to allow calling into either the outergroup as a whole, or just its inner sub-group);

I'm not seeing why the nested internal group definition is useful. As things currently stand, it seems like these are equivalent:

#1
; ...
#

#2
; ...
> 1
#

>1
>2

and

#2
; ...
#1
; ...
#
> 1
#

>1
>2

I thought nested definitions weren't actually used in practice, but I see that there is one in the ERG's wiki.rpp:

#1
#2
!\[\[(?:[^[|\]]+\|)?([^[|\]]+)\]\]                      \1
#
>2
!\[(?:http|ftp)://(?:[^[\] ]+ )?([^[\]]+)\]             \1
#
>1

So the backward-compatibility-break would be more consequential than I originally thought.

goodmami commented 3 years ago

Coming back to this, here are things to do based on the wiki updates by Stephan:

edit: PyDelphin allows for a default tokenizer and I may have been reading the spec too strictly, so I changed the requirement about tokenization patterns in non-top-level modules.