API to override prototype parsing

Perl / perl5

🐪 The Perl programming language

https://dev.perl.org/perl5/

Other

1.99k stars 559 forks source link

API to override prototype parsing #13400

Open p5pRT opened 11 years ago

p5pRT commented 11 years ago

Migrated from rt.perl.org#120458 (status was 'open')

Searchable as RT120458$

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

As part of the effort to get signatures into core\, I'd like to add a mechanism to core to conditionally hand off lexing/parsing to another function when the parenthesized block after a sub is encountered.

Specifically\, $^H{prototype_parser}=\&parser_function} would register a function so that when the parser hits the first '(' in a sub declaration\, the parser function would become responsible for consuming the text until it reaches the appropriate ') and leaving the parser state at the next character after it. The function called takes no arguments (everything it needs to touch is in PL_parser or other global state)\, and returns either an SV to use as a traditional prototype\, or undef if it doesn't need to bother with the prototype on the CV.

The reason for suggesting a parser hook rather than just taking the raw prototype is twofold: first\, the current quoting behavior means to use a closing parens\, it has to be escaped\, which is an ugly gotcha\, and second\, the most useful behaviors for a sub signature need to occur before the text of the sub is parsed\, so that any lexical variables\, etc\, are properly marked.

I have https://github.com/PeterMartini/perl/tree/parsehook with a first run at this. All the code is in the most recent commit (as of this writing anyway)\, and I haven't yet committed the tests or doc updates.

A note about the implmentation: prototypes are currently consumed by the tokenizer when the sub name is discovered\, but not applied until the complete sub\, including the body\, is turned into an op tree. Because perly.y doesn't do its startsub magic until after the name is discovered (and by extension\, the prototype)\, the hook can't be called when that first '(' is discovered\, and because the PROTO is applied after the body is fully parsed\, it can't be done there\, either\, so instead its being done\, conditionally\, inside the PROTO block itself in perly. Which seems like the right place for it anyway.

Comments on the code and approach appreciated.

p5pRT commented 11 years ago

From zefram@fysh.org

Peter Martini wrote:

As part of the effort to get signatures into core\, I'd like to add a mechanism to core to conditionally hand off lexing/parsing to another function when the parenthesized block after a sub is encountered.

Is there a particular reason why you want the hook there rather than hooking the "sub" keyword? The latter is already possible\, via the keyword plugin mechanism\, or via the call-parser hook of Devel::CallParser (which I'm looking to get into core in 5.22).

The function called takes no arguments (everything it needs to touch is in PL_parser or other global state)\,

This sounds like a messy interface that we probably don't want to bless as a supported API.

-zefram

p5pRT commented 11 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

On Mon\, Nov 4\, 2013 at 10:20 AM\, Peter Martini \petercmartini@gmail\.com wrote:

On Mon\, Nov 4\, 2013 at 10:15 AM\, Zefram \zefram@fysh\.org wrote:

Peter Martini wrote:

As part of the effort to get signatures into core\, I'd like to add a mechanism to core to conditionally hand off lexing/parsing to another function when the parenthesized block after a sub is encountered.

Is there a particular reason why you want the hook there rather than hooking the "sub" keyword? The latter is already possible\, via the keyword plugin mechanism\, or via the call-parser hook of Devel::CallParser (which I'm looking to get into core in 5.22).

The function called takes no arguments (everything it needs to touch is in PL_parser or other global state)\,

This sounds like a messy interface that we probably don't want to bless as a supported API.

It's messy\, but I don't see how it's messier than globally replacing the sub keyword. And the point is to redefine the body of the sub while its being defined\, I thought Devel::CallParser works when the sub is being called? And only if its being called by a resolveable name?

-zefram

Gah\, resending\, and CC'ing p5p this time.

p5pRT commented 11 years ago

From zefram@fysh.org

Peter Martini wrote:

It's messy\, but I don't see how it's messier than globally replacing the sub keyword.

The keyword can be redefined in whatever scope is desired.

             and the point is to redefine the body of the sub
while its being defined\, I thought Devel::CallParser works when the sub is being called?

D:CP would be applied to an &sub\, which can be imported into whatever scope is required. A call parser operates when a call to the sub is being parsed; in the case of one applied to &sub that would be whenever the parser sees the word "sub" in a suitable context.

-zefram

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

On Mon\, Nov 4\, 2013 at 10:28 AM\, Zefram via RT \perlbug\-followup@perl\.org wrote:

Peter Martini wrote:

It's messy\, but I don't see how it's messier than globally replacing the sub keyword.

The keyword can be redefined in whatever scope is desired.

My understanding of the way this works is I'd have to permanently register a handler for the sub keyword\, and that handler would then have to check the hints hash for some magic value to decide whether it wants to act on the sub keyword or not.

It would also mean that it would still have to handle tokenizing/parsing\, but it would have to do it for the entire body of the sub\, rather than for the duration of (what it thinks) the signature is\, and then handing control back to perl to parse the rest of the body.

             and the point is to redefine the body of the sub
while its being defined\, I thought Devel::CallParser works when the sub is being called?
D:CP would be applied to an &sub\, which can be imported into whatever scope is required. A call parser operates when a call to the sub is being parsed; in the case of one applied to &sub that would be whenever the parser sees the word "sub" in a suitable context.

Ah\, I see what you mean\, using Devel::CallParser as another way to replace the sub keyword\, not as a way to handle foo. Still\, it seems like duplicating the entire sub keyword to change the parse of a small section seems like a lot of duplication\, and a lot more room for bugs since it now has to do the same parsing tricks it would have to do for just the prototype\, but *also* properly handle the rest of the sub definition.

-zefram

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

It's also a matter of principle\, in some form. I'm trying to get sub signatures into core\, and it seems like a copout to do so by redefining the sub keyword.

p5pRT commented 11 years ago

From zefram@fysh.org

Peter Martini wrote:

My understanding of the way this works is I'd have to permanently register a handler for the sub keyword\, and that handler would then have to check the hints hash

That's how the keyword plugin API works\, yes. It turned out that that's inconvenient\, hence the concept of call-parser magic attached to a sub which D:CP supplies. D:CP implements call-parser magic via the keyword plugin API; if we put call-parser magic into the core then the core would implement it directly.

Either way\, hooking the "sub" keyword involves an existing hook mechanism\, so I'm dubious about adding a new hook to do one job that can already be done.

It would also mean that it would still have to handle tokenizing/parsing\, but it would have to do it for the entire body of the sub\,

You get to call parse_block() for the body. That's been in the core since 5.14.

Call-parser magic on &sub has to parse the top-level syntax for subroutines: name\, prototype\, attributes\, and body. As you want to change prototype syntax this seems an appropriate level to work.

Actually\, what you're trying to do isn't *really* alternate prototype parsing: you want to add parameter list syntax\, which doesn't exist in the standard subroutine syntax. You've linked it to prototypes only because of the coincidence that your new parameter list syntax and the existing prototype syntax both use parentheses. A subroutine signature really implies some mutation of the sub's body\, to declare and initialise the parameter variables. So you *really* do want to wrap the parsing and compilation of the sub body\, so in fact call-parser magic is *exactly* what you want to hook.

duplicating the entire sub keyword to change the parse of a small
section seems like a lot of duplication\, and a lot more room for bugs

The only part of the syntax that you're retaining unmodified and is at all tricky as things stand is attributes. We could possibly help you out there by adding a parse_ function that performs standard attribute parsing.

-zefram

p5pRT commented 11 years ago

From zefram@fysh.org

Peter Martini wrote:

It's also a matter of principle\, in some form. I'm trying to get sub signatures into core\,

If they're entirely in the core then you don't need any hooks at all\, you can just modify the core parser. But there are degrees of coring; it is not silly that something in the more loosely coupled parts of the core distro could use hooks that are provided for third-party use.

Another principle to consider is that things shouldn't be in the core if they don't have to be. If sub signatures are going to be enabled by an explicit pragma\, then they can be implemented by a non-core module using the existing hooks. Anything that can be implemented this way should be at least prototyped in CPAN form.

-zefram

p5pRT commented 11 years ago

From @bulk88

On Mon Nov 04 06:30:05 2013\, pcm wrote:

I have https://github.com/PeterMartini/perl/tree/parsehook with a first run at this. All the code is in the most recent commit (as of this writing anyway)\, and I haven't yet committed the tests or doc updates. ................. Comments on the code and approach appreciated.

proto : /* NULL */ { $$ = (OP*)NULL; } | THING + { + if (cSVOPx_sv($$) == &PL_sv_undef) { + SV **svp = hv_fetchs(GvHV(PL_hintgv)\, "prototype_parser"\, FALSE); + SV *rv; + SV *cv; + if (svp && (rv = *svp) && SvROK(rv) && ((cv = SvRV(rv)) != NULL) + && SvTYPE(cv) == SVt_PVCV) {

Why the NULL check on SvRV? I thought if ROK then its safe to deref the sv_u.

+ dSP; // redundant but a memory read will optimize away + SV *sv; //you sure you dont need a PUSHMARK? + Perl_call_sv(aTHX_ cv\, G_SCALAR|G_NOARGS); + SPAGAIN; + sv = POPs; //PUTBACK here\, more efficient so some vars arent saved across the newSVOP + if (sv == &PL_sv_undef) + $$ = (OP*)NULL; + else + $$ = newSVOP(OP_CONST\, 0\, SvREFCNT_inc_NN(sv)); + PUTBACK;

move the PUTBACK earlier

+ /* The call_sv may have added lexicals. + intro_my will make them visible before the first statement */ + intro_my();

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

On Mon\, Nov 4\, 2013 at 11:14 AM\, Zefram \zefram@fysh\.org wrote:

Peter Martini wrote:

It's also a matter of principle\, in some form. I'm trying to get sub signatures into core\,

If they're entirely in the core then you don't need any hooks at all\, you can just modify the core parser. But there are degrees of coring; it is not silly that something in the more loosely coupled parts of the core distro could use hooks that are provided for third-party use.

I know\, and that's the basis of my approach\, adding an API that other modules could use to provide custom signature handling that the core would also use. Eating our own dog food and all that.

Another principle to consider is that things shouldn't be in the core if they don't have to be. If sub signatures are going to be enabled by an explicit pragma\, then they can be implemented by a non-core module using the existing hooks. Anything that can be implemented this way should be at least prototyped in CPAN form.

-zefram

signatures is a CPAN module right now\, but has some non-core dependencies\, which shouldn't be made core.

And really\, the longer term goal is to have it enabled without an explicit pragma - it can coexist just fine with the prototype system\, since that was an explicit design goal behind prototypes.

I think proper sub signatures is something that need to be in core\, not from a technical level\, but from a language design level. It's a gaping hole that's been acknowledged by our own documentation since 1994.

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

On Mon\, Nov 4\, 2013 at 11:18 AM\, bulk88 via RT \perlbug\-followup@perl\.org wrote:

On Mon Nov 04 06:30:05 2013\, pcm wrote:

I have https://github.com/PeterMartini/perl/tree/parsehook with a first run at this. All the code is in the most recent commit (as of this writing anyway)\, and I haven't yet committed the tests or doc updates. ................. Comments on the code and approach appreciated.

proto : /* NULL */ { $$ = (OP*)NULL; } | THING + { + if (cSVOPx_sv($$) == &PL_sv_undef) { + SV **svp = hv_fetchs(GvHV(PL_hintgv)\, "prototype_parser"\, FALSE); + SV *rv; + SV *cv; + if (svp && (rv = *svp) && SvROK(rv) && ((cv = SvRV(rv)) != NULL) + && SvTYPE(cv) == SVt_PVCV) {

Why the NULL check on SvRV? I thought if ROK then its safe to deref the sv_u.

I used the code for charnames as a prototype\, and that snuck in. I see your point and agree\, which would suggest that charnames could avoid it as well\, right?

+ dSP; // redundant but a memory read will optimize away + SV *sv; //you sure you dont need a PUSHMARK?

Not sure\, actually. I'll chew on that one.

+ Perl_call_sv(aTHX_ cv\, G_SCALAR|G_NOARGS); + SPAGAIN; + sv = POPs; //PUTBACK here\, more efficient so some vars arent saved across the newSVOP

Fair enough. Does that have any practical effect? I mean\, I didn't think the newSVOP would care about that stack. But it certainly reads cleaner\, so fine by me.

+ if (sv == &PL_sv_undef) + $$ = (OP*)NULL; + else + $$ = newSVOP(OP_CONST\, 0\, SvREFCNT_inc_NN(sv)); + PUTBACK;

move the PUTBACK earlier

+ /* The call_sv may have added lexicals. + intro_my will make them visible before the first statement */ + intro_my();

-- bulk88 ~ bulk88 at hotmail.com

Thanks for the feedback.

p5pRT commented 11 years ago

From zefram@fysh.org

Peter Martini wrote:

The core has a couple of built in attributes\, and a callback mechanism to handle the rest.

Oh yes\, it would be nasty to expose that difference. We should certainly have a core API function to *apply* a named attribute; that would be the only place that needs to know which attributes are implemented which way.

                                     It also means this has to
have its own logic for handling lexical subs ... speaking of\, could Devel::CallParser do that?

I was wondering about that. I'm still not really up to date on the lexical sub stuff; that all came in while I was out of action. "my sub" is a sequence of two keywords\, not a natural thing to hook with D:CP. D:CP can certainly let you define a "my_sub" keyword that behaves similarly to the built-in "my sub". I think\, if call-parser magic is taken to be the way forward generally for syntax plugins\, I'd want to make the core have "my sub" invoke call-parser magic on a sub with a magic name\, such as &{"my sub"}. That'd just be to make "my sub" hookable in the same way that "sub" is hookable.

There's an outstanding problem with attaching call-parser magic to lexically-defined subs\, but that's a different and unrelated side of D:CP. (Also outstanding simply because I'm not up to date yet.)

The point of this API\, for my uses\, is to be able to add items to the PAD\, and to attach some sub initialization magic (which\, yes\, will mean putting a new op type as the first OP in the body rather than using magic\,

This implies that\, if you don't use any existing hook that wraps sub body compilation\, you'll need some additional hook to modify the body. As it's so closely related to your desire to override `prototype' parsing\, I think we should consider these two requirements together.

   its going out of its way to look and feel like a core part of
the language\, rather than a macro for generating code at the start of the sub).

I think we should be blurring\, as far as possible\, this distinction between "macro" (syntax plugin) and "core part of the language". Lisp wins big from the power of macros and their near equivalence to primitively-defined syntax.

Isn't that kind of inverted though\, to expose all the parts of parsing a sub under different functions

It's a great enabling technology. Exposing parser functions for standard bits of grammar allows them to be reused in novel combinations from syntax plugins. It's not only to allow overriding "sub". Even among pure and nearly-pure extensions of "sub"\, I think your view of what grammar people are likely to want to change is unrealistically narrow.

A hook for parsing of parens following "sub" has very limited application\, and doesn't even achieve all that you need in this area. There is at least one good argument for adding such a hook\, namely to help independent "sub" extensions work cooperatively when invoked together\, but you haven't made that argument. I'd prefer that what hooks we add to the core are powerful and generic\, so that we get the maximum extensibility for the minimum API complication.

-zefram

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

On Mon\, Nov 4\, 2013 at 12:45 PM\, Zefram \zefram@fysh\.org wrote:

Peter Martini wrote:

The core has a couple of built in attributes\, and a callback mechanism to handle the rest.

Oh yes\, it would be nasty to expose that difference. We should certainly have a core API function to *apply* a named attribute; that would be the only place that needs to know which attributes are implemented which way.
                                     It also means this has to
have its own logic for handling lexical subs ... speaking of\, could Devel::CallParser do that?
I was wondering about that. I'm still not really up to date on the lexical sub stuff; that all came in while I was out of action. "my sub" is a sequence of two keywords\, not a natural thing to hook with D:CP. D:CP can certainly let you define a "my_sub" keyword that behaves similarly to the built-in "my sub". I think\, if call-parser magic is taken to be the way forward generally for syntax plugins\, I'd want to make the core have "my sub" invoke call-parser magic on a sub with a magic name\, such as &{"my sub"}. That'd just be to make "my sub" hookable in the same way that "sub" is hookable.

There's an outstanding problem with attaching call-parser magic to lexically-defined subs\, but that's a different and unrelated side of D:CP. (Also outstanding simply because I'm not up to date yet.)

The point of this API\, for my uses\, is to be able to add items to the PAD\, and to attach some sub initialization magic (which\, yes\, will mean putting a new op type as the first OP in the body rather than using magic\,

This implies that\, if you don't use any existing hook that wraps sub body compilation\, you'll need some additional hook to modify the body. As it's so closely related to your desire to override `prototype' parsing\, I think we should consider these two requirements together.

That was the point of 120284: Enhancement request- Add CV Magic to be executed immediately before sub execution. Which we can withdraw\, by the way\, since that one's definitely not going anywhere - but what I'd like to replace it with is a custom OP\, let's call it initsub\, without a fixed ppaddr (so whatever module happens to want to add a C level routine to do initialization could do it).

   its going out of its way to look and feel like a core part of
the language\, rather than a macro for generating code at the start of the sub).
I think we should be blurring\, as far as possible\, this distinction between "macro" (syntax plugin) and "core part of the language". Lisp wins big from the power of macros and their near equivalence to primitively-defined syntax.

Isn't that kind of inverted though\, to expose all the parts of parsing a sub under different functions

It's a great enabling technology. Exposing parser functions for standard bits of grammar allows them to be reused in novel combinations from syntax plugins. It's not only to allow overriding "sub". Even among pure and nearly-pure extensions of "sub"\, I think your view of what grammar people are likely to want to change is unrealistically narrow.

A hook for parsing of parens following "sub" has very limited application\, and doesn't even achieve all that you need in this area. There is at least one good argument for adding such a hook\, namely to help independent "sub" extensions work cooperatively when invoked together\, but you haven't made that argument. I'd prefer that what hooks we add to the core are powerful and generic\, so that we get the maximum extensibility for the minimum API complication.

I suppose that's the core of our disagreement then. My goal is explicitly to create an API just wide enough to allow solving a common problem\, adding parameter lists / signatures to subs\, and no wider.

I wouldn't complain about adding a parse_attr function though :-)

p5pRT commented 11 years ago

From @bulk88

On Mon Nov 04 09:17:12 2013\, pcm wrote:

On Mon\, Nov 4\, 2013 at 11:18 AM\, bulk88 via RT

proto : /* NULL */ { $$ = (OP*)NULL; } | THING + { + if (cSVOPx_sv($$) == &PL_sv_undef) { + SV **svp = hv_fetchs(GvHV(PL_hintgv)\, "prototype_parser"\, FALSE); + SV *rv; + SV *cv; + if (svp && (rv = *svp) && SvROK(rv) && ((cv = SvRV(rv)) != NULL) + && SvTYPE(cv) == SVt_PVCV) {

Why the NULL check on SvRV? I thought if ROK then its safe to deref the sv_u.

I used the code for charnames as a prototype\, and that snuck in. I see your point and agree\, which would suggest that charnames could avoid it as well\, right?

IDK\, ask KHW http://perl5.git.perl.org/perl.git/blobdiff/225fb84f3eb1da83cbc8c79add24882deac79906..0c415a7950ced3bdd13d9361e7154695c677851b:/toke.c .

+ dSP; // redundant but a memory read will optimize away + SV *sv; //you sure you dont need a PUSHMARK?

Not sure\, actually. I'll chew on that one.

Try using DEBUGGING Perl. It might catch the markstack underflowing (or other random stacks inconsistencies) that non DEBUGGING wont.

+ Perl_call_sv(aTHX_ cv\, G_SCALAR|G_NOARGS); + SPAGAIN; + sv = POPs; //PUTBACK here\, more efficient so some vars arent saved across the newSVOP

Fair enough. Does that have any practical effect? I mean\, I didn't think the newSVOP would care about that stack. But it certainly reads cleaner\, so fine by me.

It is a C/machine code optimization. In asm code saving variables between function calls has cost\, either the variable is copied to C stack\, making the stack frame 1 pointer sized number bigger\, or the variable is in a non-vol register. To use that non-vol reg means saving and restoring it at the start and end of the function. That takes 1 C stack slot to do. Keep variables liveness/instances of usage close together if possible\, note\, if possible. If you derefing the same struct multiple times (AKA use one of Perl's gazillion macros) in between 2 function calls\, in machine code it was only "read" once. If you deref the struct before a function call\, and then after a function\, the C compiler has to read the struct twice\, the C compiler doesn't know what that function call did and didn't change\, the function call could have zeroed the entire VM space of the process as far as the C compiler knows (C autos\, that never get & done on them\, are NOT reread between function calls if enough spare non-regs exist at that moment in the caller function). There is one C compiler optimization called pure\, http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007bpure_007d-function-attribute-2822 perl in a !!few rare cases!! uses it. Pure basically says this function and its children will not modify malloc memory.

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 11 years ago

From @cpansprout

On Mon Nov 04 09:46:33 2013\, zefram@fysh.org wrote:

I was wondering about that. I'm still not really up to date on the lexical sub stuff; that all came in while I was out of action. "my sub" is a sequence of two keywords\, not a natural thing to hook with D:CP. D:CP can certainly let you define a "my_sub" keyword that behaves similarly to the built-in "my sub". I think\, if call-parser magic is taken to be the way forward generally for syntax plugins\, I'd want to make the core have "my sub" invoke call-parser magic on a sub with a magic name\, such as &{"my sub"}. That'd just be to make "my sub" hookable in the same way that "sub" is hookable.

I don’t see how ‘my sub’ is fundamentally different from if/else\, which is also two keywords.

Father Chrysostomos

p5pRT commented 11 years ago

From zefram@fysh.org

Father Chrysostomos via RT wrote:

I don't see how 'my sub' is fundamentally different from if/else\, which is also two keywords.

It's fundamentally different in that the "if"/"else" construct is a variation of the "if" construct\, unambiguously introduced by the single keyword "if". The "else" keyword is only meaningful as part of that construct\, and we don't need to distinguish up front (when the first keyword is seen) between "if"/"else"\, "if" without "else"\, "if"/"elsif"/"else"\, and the other grammatical chains that can follow "if".

"my sub" is a variant of "sub" more than it is a variant of "my". We want to treat "my sub" in a manner that's substantially different from the other constructs that can start with "my".

-zefram

p5pRT commented 11 years ago

From @doy

On Mon\, Nov 04\, 2013 at 09:46:33AM -0800\, Zefram via RT wrote:

Peter Martini wrote:

The core has a couple of built in attributes\, and a callback mechanism to handle the rest.

Oh yes\, it would be nasty to expose that difference. We should certainly have a core API function to *apply* a named attribute; that would be the only place that needs to know which attributes are implemented which way.
                                     It also means this has to
have its own logic for handling lexical subs ... speaking of\, could Devel::CallParser do that?
I was wondering about that. I'm still not really up to date on the lexical sub stuff; that all came in while I was out of action. "my sub" is a sequence of two keywords\, not a natural thing to hook with D:CP. D:CP can certainly let you define a "my_sub" keyword that behaves similarly to the built-in "my sub". I think\, if call-parser magic is taken to be the way forward generally for syntax plugins\, I'd want to make the core have "my sub" invoke call-parser magic on a sub with a magic name\, such as &{"my sub"}. That'd just be to make "my sub" hookable in the same way that "sub" is hookable.

There's an outstanding problem with attaching call-parser magic to lexically-defined subs\, but that's a different and unrelated side of D:CP. (Also outstanding simply because I'm not up to date yet.)

The point of this API\, for my uses\, is to be able to add items to the PAD\, and to attach some sub initialization magic (which\, yes\, will mean putting a new op type as the first OP in the body rather than using magic\,

This implies that\, if you don't use any existing hook that wraps sub body compilation\, you'll need some additional hook to modify the body. As it's so closely related to your desire to override `prototype' parsing\, I think we should consider these two requirements together.
   its going out of its way to look and feel like a core part of
the language\, rather than a macro for generating code at the start of the sub).
I think we should be blurring\, as far as possible\, this distinction between "macro" (syntax plugin) and "core part of the language". Lisp wins big from the power of macros and their near equivalence to primitively-defined syntax.

Isn't that kind of inverted though\, to expose all the parts of parsing a sub under different functions

It's a great enabling technology. Exposing parser functions for standard bits of grammar allows them to be reused in novel combinations from syntax plugins. It's not only to allow overriding "sub". Even among pure and nearly-pure extensions of "sub"\, I think your view of what grammar people are likely to want to change is unrealistically narrow.

A hook for parsing of parens following "sub" has very limited application\, and doesn't even achieve all that you need in this area. There is at least one good argument for adding such a hook\, namely to help independent "sub" extensions work cooperatively when invoked together\, but you haven't made that argument. I'd prefer that what hooks we add to the core are powerful and generic\, so that we get the maximum extensibility for the minimum API complication.

I agree. For the p5mop project\, for instance\, we're going to be introducing a 'method' keyword that will also want signature parsing. Doing it through an extra hook like this means that we'll need to stop parsing our method keyword\, figure out a way to hand control of the parser back to perl in order to trigger the parser hook (or trigger it manually\, if that's possible\, but things like this historically have a tendency to require a bunch of setup that gets either inlined directly or wrapped up in an inaccessible static function)\, and then read the results out of some global data structure before we restart parsing. It would really be a lot easier if "parse a subroutine signature" was just an API function that we could call at whatever point in our existing custom parser we wanted to\, just like we can already call parse_block and things like that.

-doy

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

On Wed\, Nov 6\, 2013 at 11:53 AM\, Jesse Luehrs \doy@tozt\.net wrote:

On Mon\, Nov 04\, 2013 at 09:46:33AM -0800\, Zefram via RT wrote:
Peter Martini wrote:

The core has a couple of built in attributes\, and a callback mechanism to handle the rest.

Oh yes\, it would be nasty to expose that difference. We should certainly have a core API function to *apply* a named attribute; that would be the only place that needs to know which attributes are implemented which way.
                                     It also means this has to
have its own logic for handling lexical subs ... speaking of\, could Devel::CallParser do that?
I was wondering about that. I'm still not really up to date on the lexical sub stuff; that all came in while I was out of action. "my sub" is a sequence of two keywords\, not a natural thing to hook with D:CP. D:CP can certainly let you define a "my_sub" keyword that behaves similarly to the built-in "my sub". I think\, if call-parser magic is taken to be the way forward generally for syntax plugins\, I'd want to make the core have "my sub" invoke call-parser magic on a sub with a magic name\, such as &{"my sub"}. That'd just be to make "my sub" hookable in the same way that "sub" is hookable.

There's an outstanding problem with attaching call-parser magic to lexically-defined subs\, but that's a different and unrelated side of D:CP. (Also outstanding simply because I'm not up to date yet.)

The point of this API\, for my uses\, is to be able to add items to the PAD\, and to attach some sub initialization magic (which\, yes\, will mean putting a new op type as the first OP in the body rather than using magic\,

This implies that\, if you don't use any existing hook that wraps sub body compilation\, you'll need some additional hook to modify the body. As it's so closely related to your desire to override `prototype' parsing\, I think we should consider these two requirements together.
   its going out of its way to look and feel like a core part of
the language\, rather than a macro for generating code at the start of the sub).
I think we should be blurring\, as far as possible\, this distinction between "macro" (syntax plugin) and "core part of the language". Lisp wins big from the power of macros and their near equivalence to primitively-defined syntax.

Isn't that kind of inverted though\, to expose all the parts of parsing a sub under different functions

It's a great enabling technology. Exposing parser functions for standard bits of grammar allows them to be reused in novel combinations from syntax plugins. It's not only to allow overriding "sub". Even among pure and nearly-pure extensions of "sub"\, I think your view of what grammar people are likely to want to change is unrealistically narrow.

A hook for parsing of parens following "sub" has very limited application\, and doesn't even achieve all that you need in this area. There is at least one good argument for adding such a hook\, namely to help independent "sub" extensions work cooperatively when invoked together\, but you haven't made that argument. I'd prefer that what hooks we add to the core are powerful and generic\, so that we get the maximum extensibility for the minimum API complication.
I agree. For the p5mop project\, for instance\, we're going to be introducing a 'method' keyword that will also want signature parsing. Doing it through an extra hook like this means that we'll need to stop parsing our method keyword\, figure out a way to hand control of the parser back to perl in order to trigger the parser hook (or trigger it manually\, if that's possible\, but things like this historically have a tendency to require a bunch of setup that gets either inlined directly or wrapped up in an inaccessible static function)\, and then read the results out of some global data structure before we restart parsing. It would really be a lot easier if "parse a subroutine signature" was just an API function that we could call at whatever point in our existing custom parser we wanted to\, just like we can already call parse_block and things like that.

-doy

But the point is Perl doesn't need to provide a "parse a subroutine signature" API\, it needs to call out to such an API. parse_block is used from your XS code to tell Perl to pick up and continue parsing; to add true signatures\, we need Perl to stop parsing and hand off to XS (or for the entire processing of the sub to be XS\, in which case it wouldn't have anything at all to do with the Perl core). Anyway\, for the core to use a callback\, it needs to be registered somewhere\, and the hints hash seems like as good a place as any.

Also\, while this does neatly solve the issue of handling sub\, my sub\, our sub\, and state sub\, the method keyword would be different enough that it wouldn't make sense for it to call out to the signature parser anyway. The signature for a 'sub' and the signature for a 'method' could and would mean entirely different things\, and I think it would be inappropriate for one to use a callback meant for the other.

Side note: The big picture is I want to be able to register a callback\, so that when parsing hits the opening parens indicating a 'proto' in the current grammar\, one of NULL\, an OP_CONST indicating a traditional prototype\, or another OP to be embedded at the head of the OP tree generated for the body of the sub. That 'other OP' could presumably be generated by parse_block.

I'd also like to have a new type of OP\, let's call it INITSUB\, which would not have any code associated with it in the core\, it would just be a place holder to tie a C/XS routine into the body of a sub as its first action\, as an alternative to tieing an OP tree generated by parse_block.

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

So\, recap\, I hope:

I'm looking to be able to register a callback\, such that when perl hits the '(' following either the 'sub' token or 'sub' NAME\, C/XS code can take over and handle parsing until it determines its hit the matching ')'. Once its done\, it can return either null\, a value to indicate a traditional prototype to use for this sub\, or an OP (a custom OP with custom code to run\, an OP tree generated through some other parse_* calls\, whatever). The core functions that generate new subs right now\, newMYSUB and newATTRSUB\, take an OP * representing the proto. If the callback returned any OP other than OP_CONST\, it would be presumed to be a tree meant to be prepended onto block\, the OP * representing the body of the sub.

The alternative being discussed here\, and please correct me if I'm wrong\, is to instead have a callback for when the 'sub' keyword is detected\, which would use the various parse_* routines to hand control back to the perl core to parse everything other than the slot where the prototype currently lives in the grammar.

Does that sum it up?

p5pRT commented 11 years ago

From zefram@fysh.org

Peter Martini wrote:

   If the callback returned any OP other than OP\_CONST\, it would
be presumed to be a tree meant to be prepended onto block\, the OP * representing the body of the sub.

That's a rather messy interface\, with very limited power. Prepending an optree to the sub body is too specific to your particular problem.

-zefram

p5pRT commented 11 years ago

From zefram@fysh.org

Peter Martini wrote:

the core to use a callback\, it needs to be registered somewhere\, and the hints hash seems like as good a place as any.

The hint hash isn't a great place to register a C function callback. We don't have a standard way to reference such a function from an SV. It *can* be done\, but it loses C's type checking\, and introduces another way in which pure Perl code can crash the interpreter.

                                       That 'other OP' could

presumably be generated by parse_block.

You want to give the sub two bodies? I can see uses for blocks (parsed by parse_block()) embedded in a sub signature\, most obviously default value generation\, but you seem to be suggesting here that *all* the extra code from the sub signature be generated by a single parse_block(). I don't see what use case you're imagining here.

I'd also like to have a new type of OP\, let's call it INITSUB\, which would not have any code associated with it in the core\,

This is not what op types are for. If you want to inject arbitrary code at the beginning of the sub\, you can build whatever ops you want\, that have whatever effect you want. There's no gain from labelling all this varied code with a single op type; conversely\, there's no reason to eschew the use of standard op types\, if they have the effect that you want. An op type reflects what the op *does*\, not where it's used.

be a place holder to tie a C/XS routine into the body of a sub as its first action\, as an alternative to tieing an OP tree generated by parse_block.

You don't mention any way to control what code gets attached to the initsub op for a particular sub. You need a data path that would communicate something\, at least a pp func address\, and if it can communicate that then it can just as easily communicate a complete op tree with arbitrary op types. You haven't solved anything here.

-zefram

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

On Thu\, Nov 7\, 2013 at 6:12 AM\, Zefram \zefram@fysh\.org wrote:

Peter Martini wrote:

the core to use a callback\, it needs to be registered somewhere\, and the hints hash seems like as good a place as any.

The hint hash isn't a great place to register a C function callback. We don't have a standard way to reference such a function from an SV. It *can* be done\, but it loses C's type checking\, and introduces another way in which pure Perl code can crash the interpreter.

It's not a C function callback\, its an XS or pure perl function. Although a pure perl function would be pretty useless and would immediately generate a syntax error\, unless its calling something else to do the parser work.

                                       That 'other OP' could
presumably be generated by parse_block.
You want to give the sub two bodies? I can see uses for blocks (parsed by parse_block()) embedded in a sub signature\, most obviously default value generation\, but you seem to be suggesting here that *all* the extra code from the sub signature be generated by a single parse_block(). I don't see what use case you're imagining here.

Default value generation\, bounds checking\, loading @_ into the proper lexical variables. I'm suggesting the callback map the signature into an equivalent block of perl code\, and use parse_block\, or some equivalent\, to convert that to an OP tree. I don't like the idea of having something outside of the Perl core generating OPs directly.

I'd also like to have a new type of OP\, let's call it INITSUB\, which would not have any code associated with it in the core\,

This is not what op types are for. If you want to inject arbitrary code at the beginning of the sub\, you can build whatever ops you want\, that have whatever effect you want. There's no gain from labelling all this varied code with a single op type; conversely\, there's no reason to eschew the use of standard op types\, if they have the effect that you want. An op type reflects what the op *does*\, not where it's used.

What it does is initialize a sub when it's called. And the reason for a distinct OP is to be able to do the initialization in XS\, not Perl OPs\, although as I said\, there's nothing stopping you from putting a chain of ordinary OPs in instead.

be a place holder to tie a C/XS routine into the body of a sub as its first action\, as an alternative to tieing an OP tree generated by parse_block.

You don't mention any way to control what code gets attached to the initsub op for a particular sub. You need a data path that would communicate something\, at least a pp func address\, and if it can communicate that then it can just as easily communicate a complete op tree with arbitrary op types. You haven't solved anything here.

-zefram

Initializing the initsub OP would be a distinct function which would set the pp func address as needed.

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

On Thu\, Nov 7\, 2013 at 5:58 AM\, Zefram \zefram@fysh\.org wrote:

Peter Martini wrote:
   If the callback returned any OP other than OP\_CONST\, it would
be presumed to be a tree meant to be prepended onto block\, the OP * representing the body of the sub.
That's a rather messy interface\, with very limited power. Prepending an optree to the sub body is too specific to your particular problem.

-zefram

The limited power and specific to my problem are a feature\, not a bug. We already have an API for declaring new keywords\, which solves the category of problems you're interested in. The problem I'm trying to fix is the sub keyword we ship has been promising a feature that it doesn't deliver\, and I specifically want to fix the built in keyword\, not replace it with a brand new one that happens to go by the same name.

p5pRT commented 11 years ago

From PeterCMartini@GMail.com

On Thu\, Nov 7\, 2013 at 6:12 AM\, Zefram \zefram@fysh\.org wrote:

Peter Martini wrote:

the core to use a callback\, it needs to be registered somewhere\, and the hints hash seems like as good a place as any.

The hint hash isn't a great place to register a C function callback. We don't have a standard way to reference such a function from an SV. It *can* be done\, but it loses C's type checking\, and introduces another way in which pure Perl code can crash the interpreter.

The callback in the code I'd posted for discussion here\, which I'm actually liking more as this discussion wears on\, doesn't introduce any new risks for ways in which pure Perl code can crash the interpreter.

$^H{prototype_parser}\, if its a reference to a CV\, will call that CV with no arguments. What that function returns is either undef\, in which case the parser has opted to not use the prototype system\, or an SV\, in which case that will be used as the prototype for the sub. If prototype_parser is set to a pure perl function\, or to something which doesn't use PL_parser\, perl will print out a syntax error\, pointing to the opening parens. If the parsing function misbehaves\, it'll die in the exact same way it would die if it were going to interpret the sub keyword directly - either a syntax error\, or a crash\, depending on the same considerations as would be present in any other XS using a parse_* function or the PL_parser struct directly.

Now\, the API I want to use within that prototype parser is\, roughly:

Perl_set_cv_init(pTHX_ CV * cv\, SV * description\, FUNC * initfunc);

Where description would be used for Deparse to print out how the signature was declared\, and FUNC is a C level callback to be run after the state setup part of entersub has run\, but before the first true OP in the body runs. Where that function pointer is stored\, and how it runs\, are immaterial; it could be magic on the CV\, it could be a custom OP prepended onto the body\, whatever.

...

Is your fundamental objection that C/XS is fine for the *parsing* stage\, but gets to risky in terms of ability to crash the interpreter\, when code is actually being run? In other words\, that for runtime operations\, everything should be at at the OP level?

Of course running OPs directly is safer\, but we already rely heavily on XS and C level callbacks; this is adding one more straw to the camel\, and my own opinion is that this isn't the one that breaks its back.

p5pRT commented 11 years ago

From zefram@fysh.org

Peter Martini wrote:

It's not a C function callback\, its an XS or pure perl function.

I see. We've never managed parser-related hooks that way before. There's a certain amount of conflict between Perl parse-time and runtime data structures\, for example in their clashing ideas of "current pad" which is very relevant to your use case. In the long run I certainly intend to be able to use Perl subs in parser callbacks\, but it needs a B:: shim layer to make it work.

Also\, if you use Perl CV calling as the hook mechanism you'll run into the rather large overhead of calling a CV. Those who are not using actual Perl code in the callbacks probably shouldn't be subjected to the costs. Though it may turn out to be insignificant for something only called once per subroutine compilation.

Presumably\, where you want the callback function to return an op tree\, it would return it as a B::OP (the usual way to handle OPs as SVs). That's more expense\, and a rather heavy module to load for something that doesn't otherwise use B::.

               I'm suggesting the callback map the signature into
an equivalent block of perl code\, and use parse_block\, or some equivalent\, to convert that to an OP tree.

Eww. Generating textual code in such circumstances is a dicey business. Code is generated under some assumptions about how it'll be parsed\, that may not be correct for the current parser state under the influence of unknown pragmata. Anywhere that the signature has inline code\, such as a default value expression\, that you want to embed in the generated code\, you need either a general deparser or a guess about the extent of the expression from something that doesn't actually parse it\, and neither of those really works. You're importing the problems of source filters.

                                       I don't like the idea of

having something outside of the Perl core generating OPs directly.

Why? We do it all the time\, with clean results. Generating ops directly means that there's no vulnerability to unknown parser pragmata. When the extended syntax includes an embedded expression\, you can use the real parser to turn the expression into an optree\, and then the optree can be cleanly incorporated into the larger optree being built. Generating ops lets you work with the semantics of code directly\, which is what you really want.

What it does is initialize a sub when it's called.

The nature of the initialisation is\, in your system\, different per sub. You seem to be back to your older idea of calling magic before running the main sub body\, introducing a false dichotomy between "initialisation" and "body".

                                               And the reason for
a distinct OP is to be able to do the initialization in XS\, not Perl OPs\,

An op can already use arbitrary C code. We don't need a new op type for that. Indeed\, we already have an op type that specifically signals that the op is non-standard\, OP_CUSTOM.

Initializing the initsub OP would be a distinct function which would set the pp func address as needed.

You should elaborate on this\, as it's an essential part of the hook system that you're describing. Do you imagine a separate callback hook for initialisation of this op\, with the signature parser hooking this alongside the parser? How would you pass the necessary data from the parser callback to the initsub-initialising callback?

-zefram

p5pRT commented 11 years ago

From zefram@fysh.org

Peter Martini wrote:

Perl_set_cv_init(pTHX_ CV * cv\, SV * description\, FUNC * initfunc);

Right\, that's your old idea with the false distinction between "init magic" and "true body". You're supposing that the parser callback would call this on PL_compcv?

              Where that function pointer is stored\, and how it
runs\, are immaterial; it could be magic on the CV\, it could be a custom OP prepended onto the body\, whatever.

It can't be purely an op\, if you're calling set_cv_init() from the signature parser\, because when the signature parser runs the sub doesn't have a body yet. It would have to be stored as magic or as something that amounts to magic\, though it could then be wrapped up as an op when the rest of the body comes along.

Is your fundamental objection that C/XS is fine for the *parsing* stage\, but gets to risky in terms of ability to crash the interpreter\, when code is actually being run?

No. My objection is to the conceptual complication of adding another piece to the structure of a CV when that complication doesn't actually add any new ability. The op system provides a very general way of arranging for things to happen as part of a Perl subroutine's behaviour. I'm inclined to try to fit the behaviour that you want to add into the existing framework\, rather than add new mechanism\, and in this case it fits the existing framework very well.

-zefram