lizmat commented 2 months ago

See current definition and the original thoughts.

This issue is written out of frustration of trying to implement a more sensible way of dealing with declarator docs in RakuDoc v2, and trying to implement a sensible "safe" renderer.

Regardless of the implementation effort that has gone into this, and what still needs to go into, I wonder how many developers really like this feature, and what they would think if this feature would be removed by default in 6.e.

jubilatious1 commented 2 months ago

Could @antononcube weigh in? Seems like literate programing is an interest (RMarkdown, JuPyter notebooks).

Also, StackOverflow user suman has an interest in (and asks questions about) literate programming. Does anyone know if suman is on Github?

https://stackoverflow.com/questions/57127263/capture-and-execute-multiline-code-and-incorporate-result-in-raku
https://www.reddit.com/user/sumanstats/

ab5tract commented 2 months ago

It occurs to me that one possible solution to preserving both declarator blocks and @lizmat's sanity (;-) would be to rethink what declarator blocks actually are.

(...)

The trick would be to redefine the rule for comments so that the '#' introducer token becomes '#' <!before '|' | '='>, which would mean that leading and trailing decks are no longer skipped as being whitespace, and can only appear in the places where the Raku grammar explicitly allows them.

Thank you for expressing this so much more eloquently than have I managed so far! This is exactly what I meant in some previous comments, a la:

What I'm proposing for the grammar (and which I appreciate might not be possible) is to not treat these as whitespace. They would specifically be optional captures for routines (#|) and parameters (#=).

My eyes have been opened to decks[^1] being useful in even further contexts, such as variables and (keys of?) Pair objects. But it's important that they don't end up getting attached to the blocks of where clauses when they are meant to be attached to the parameter the where clauses belong to, etc.

[^1]: Again, thanks to @raiph++ for the excellent vocabulary suggestion!

patrickbkr commented 2 months ago

Assuming the RakuAst implementation that now works and is safe, how should the Rakudoc spec be clarified? What is not possible?

I'm not sure I understand the questions correctly. I'll try to explain to hopefully increase understanding on all sides.

RakuAST (or the Raku Grammar in general) and RakuDoc are rather separate things.

Safety:

RakuDoc - being a documentation language - does not support arbitrary code execution and as such is unproblematic with respect to software security.
Raku - being a programming language - executes code and does so already while parsing. It's inherently unsafe. We can't do anything about that.
Declarator Docs are the one functionality where RakuDoc and Raku can mix. Thus processing them is - just as processing Raku - inherently unsafe.

What is not possible?

From what I understand, lizmat started to work on some tooling to process Declarator Docs and started to hit corner cases and inconsistencies that made the job difficult. Given that Declarator Docs have always been difficult / problematic from an implementation point of view, she started this issue.

patrickbkr commented 2 months ago

Let's start a wish list of elements that should support Decks.

package / module / class / grammar / monitor
proto / method / submethod / sub / regex / token / rule
signature parameters
has / my / our / constant / state / anon / ...
Pairs
enum / enum values
subset

Questions:

Is monitor challenging given it lives in module space?
It's possible to have standalone Signatures and Pairs (i.e. not part of a routine / hash variable). If one adds a Deck to such a thing it's not obvious where to put such Docks when creating a hierarchy of all the Docks in some module.

It occurs to me that one possible solution to preserving both declarator blocks and @lizmat's sanity (;-) would be to rethink what declarator blocks actually are.

What if #| and #= were not comments at all? What if they were actually an optional component (a "Deck"?) of various declarative constructs?

I like this idea.

finanalyst commented 2 months ago

@patrickbkr Here's my take on your responses. @lizmat will probably correct me.

When I wrote RakuAST I was referring to the major rewrite of Rakudo that is currently underway. It is not complete yet because Rakudo.E does not yet pass all tests. RakuAST creates an AST of the program. The legacy version of Rakudo (there must be a better way of naming these two things) does not produce an AST.
Raku is a language, but it is distinguished from the compiler. We now have Rakudo.D and Rakudo.E (which is what I was referring to as RakuAST), which are compilers.
Declarator Docs (Decks) are formally a part of RakuDoc - though perhaps they should not be
The safety issue comes from the execution of byte code. Raku requires byte code to be executed during compilation.
RakuDoc is specified as part of Raku, and the intention was to make documentation as closely related to coding as possible. The intent was IMHO an effort to bring into Raku some of the concepts of literate programming. But this intent also brings with it the ability to write malicious code that can enter the documentation. I do not think this was appreciated when RakuDoc (aka POD6) was designed. (@thoughtstream forgive me if I am wrong about this)
@lizmat has now changed the situation with the implementation of RakuAST. It is now possible to generate the AST of a Raku program without the generation of byte code. This means that the RakuDoc components of a Raku program (and a RakuDoc source is a Raku program) can be extracted and manipulated by a renderer without ever generating byte code, and therefore be considered safe.

We need to develop some terms to make these distinctions better. I am fairly certain I have not been as clear as I should have been

patrickbkr commented 2 months ago

When I wrote RakuAST I was referring to the major rewrite of Rakudo that is currently underway. It is not complete yet because Rakudo.E does not yet pass all tests. RakuAST creates an AST of the program. The legacy version of Rakudo (there must be a better way of naming these two things) does not produce an AST.

Agreed.

Raku is a language, but it is distinguished from the compiler. We now have Rakudo.D and Rakudo.E (which is what I was referring to as RakuAST), which are compilers.

Agreed.

Declarator Docs (Decks) are formally a part of RakuDoc - though perhaps they should not be

Agreed.

The safety issue comes from the execution of byte code. Raku requires byte code to be executed during compilation.

Agreed.

RakuDoc is specified as part of Raku, and the intention was to make documentation as closely related to coding as possible. The intent was IMHO an effort to bring into Raku some of the concepts of literate programming. But this intent also brings with it the ability to write malicious code that can enter the documentation. I do not think this was appreciated when RakuDoc (aka POD6) was designed. (@thoughtstream forgive me if I am wrong about this)

Agreed.

@lizmat has now changed the situation with the implementation of RakuAST. It is now possible to generate the AST of a Raku program without the generation of byte code. This means that the RakuDoc components of a Raku program (and a RakuDoc source is a Raku program) can be extracted and manipulated by a renderer without ever generating byte code, and therefore be considered safe.

(Quick shout out to jnthn, nine, ab5tract and probably others that all have poured in work on RakuAST.)

This is sadly not true. Even in the new RakuAST compiler bytecode generation and execution can in part already happen during the parse / AST generation phase. The classic example being BEGIN blocks which are compiled and executed by the compiler as soon as the parser sees the end of the block. At that point in time the parser hasn't even looked at the text of the input file following that BEGIN block yet. This might be a little oversimplified, but in principle right.

@lizmat Agreed?

lizmat commented 2 months ago

Agreed.

niner commented 2 months ago

This is sadly not true. Even in the new RakuAST compiler bytecode generation and execution can in part already happen during the parse / AST generation phase. The classic example being BEGIN blocks which are compiled and executed by the compiler as soon as the parser sees the end of the block.

They can, but they don't have to and they do much less often than with the old frontend. Reason is that RakuAST includes infrastructure for interpreting ASTs directly. We use this to avoid the costly bytecode generation+load for trivial expressions. The most notable exception here is role bodies because they generate a lexical context. But maybe we can find an alternative to that.

finanalyst commented 2 months ago

[update after @niner's comment below] I was wondering about this and wrote a small program with a BEGIN and rakudoc block to see how getting an AST from it would differ from using EVAL on it. Unfortunately, it hits an error that I'm not sure how to deal with. Test program contents (program in file begin_ast.raku)

use experimental :rakuast;

my $prog = q:to/PROG/;
    BEGIN { say 'in BEGIN phase: here lie dragons' }
    =begin rakudoc
    In a Rakudoc block, dragons lie here sleeping
    =end rakudoc
    PROG

say 'Before Eval';
use MONKEY-SEE-NO-EVAL;
EVAL $prog;
no MONKEY-SEE-NO-EVAL;

say 'Before AST evaluation';
say $prog.AST.rakudoc;
say 'ending test';

Output in terminal:

$ raku tmp/begin_ast.raku 
Before Eval
in BEGIN phase: here lie dragons
Before AST evaluation
===SORRY!===
Unknown compilation input 'qast'
$ raku -v
Welcome to Rakudo™ v2024.08-59-gb6fa27a22.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2024.08-6-gac82e446f.

commenting out the BEGIN expression (which defeats the purpose, but shows the expected behaviour from say'ing the AST) yields the following:

$ raku tmp/begin_ast.raku 
Before Eval
Before AST evaluation
(RakuAST::Doc::Block.new(
  type       => "rakudoc",
  paragraphs => (
    "In a Rakudoc block, dragons lie here sleeping\n",
  )
))
ending test

Update

Here is the output as suggested below by @niner:

$ RAKUDO_RAKUAST=1 raku tmp/begin_ast.raku 
Before Eval
in BEGIN phase: here lie dragons
Before AST evaluation
in BEGIN phase: here lie dragons
(RakuAST::Doc::Block.new(
  type       => "rakudoc",
  paragraphs => (
    "In a Rakudoc block, dragons lie here sleeping\n",
  )
))
ending test
$ raku -v
Welcome to Rakudo™ v2024.08-66-gc3fbe0c3c.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2024.08-10-gb7750ec26.

Commentary: This means that at present BEGIN expressions do make things unsafe even with the RakuAST compiler and front-end.

lizmat commented 2 months ago

@finanalyst There's currently a bug in the handling of BEGIN phasers that only occurs when creating an AST with .AST.

finanalyst commented 2 months ago

@lizmat That's a relief. I thought I got the code wrong.

@patrickbkr When the RakuAST parser/compiler bug is fixed, and assuming that the say inside the BEGIN is not executed, would this mean the safety issue is fixed in RakuAST when processing documentation?

niner commented 2 months ago

It's not really a bug. It's just that the RakuAST frontend does not support BEGIN time compilation with the old frontend. If you run your example with RAKUDO_RAKUAST=1 you've got a decent chance of it doing what you intend.

finanalyst commented 2 months ago

@niner You were right - I've amended my comment above.

I hadn't quite understood the difference between

use experimental :rakuast in a program abc.raku and then running raku abc.raku
not including use experimental :rakuast in abc.raku and then running RAKUDO_RAKUAST=1 raku abc.raku

lizmat commented 2 months ago

What if #| and #= were not comments at all? What if they were actually an optional component (a "Deck"?) of various declarative constructs?

Then we could add them to the syntax for those constructs in the restricted locations that @lizmat is hoping for.

The trick would be to redefine the rule for comments so that the '#' introducer token becomes '#' <!before '|' | '='>, which would mean that leading and trailing decks are no longer skipped as being whitespace, and can only appear in the places where the Raku grammar explicitly allows them.

I like this idea a lot!

This will however be potentially non-trivial to implement. But still better than the current situation.

And it will probably break possibly quite a few spectests that depend on the "decks are whitespace" semantics.

Should I put this in a pull request and get a vote on that?

lizmat commented 2 months ago

Any way to have it both ways via some switch?

@tbrowder You mean "as whitespace" vs "only at specific locations"?

patrickbkr commented 2 months ago

@patrickbkr When the RakuAST parser/> compiler bug is fixed, and assuming that the say inside the BEGIN is not executed, would this mean the safety issue is fixed in RakuAST when processing documentation?

Just to clarify: The fact that Raku runs BEGIN blocks during the parse is happening deliberately and is necessary, because it is allowed to modify the parser state in BEGIN. A good example is the OO::Monitors module, which is adding a new keyword monitor. The parser needs to run that modules code for it to recognize the "monitor" keyword. If it wouldn't, the parser could not successfully parse any code using the "monitor" keyword.

tbrowder commented 2 months ago

Any way to have it both ways via some switch?

@tbrowder You mean "as whitespace" vs "only at specific locations"?

I think I misunderstood. So the result would be the decl blocks would stay as the user defined them?

finanalyst commented 2 months ago

@patrickbkr A good example is the OO::Monitors module, which is adding a new keyword monitor

Suppose it is possible to prevent the creation of bytecode, and renderer is given an AST with only rakudoc blocks, then the generation of output - including HTML - depends only on the trusted code that changes blocks into output. This trusted code is not affected by the source program.

Does this have a safety concern?

patrickbkr commented 2 months ago

Consider:

use OO::Monitors;

#|Doc
my $var;

monitor M { }

#|Doc2
my $var2;

Without running the code in OO::Monitors, the parser can't understand the monitor M { } line. The parse will fail and it's not possible to produce the AST. You can't skip running any code during the parse as running that code is often a necessity for the parser to be able to continue parsing. So in the above code the parser - working top to bottom - doesn't even reach the #|Doc2 comment on line 8 as it already fails on line 6.

On September 22, 2024 11:13:36 PM GMT+02:00, Richard Hainsworth @.***> wrote:

@patrickbkr A good example is the OO::Monitors module, which is adding a new keyword monitor

Suppose it is possible to prevent the creation of bytecode, and renderer is given an AST with only rakudoc blocks, then the generation of output - including HTML - depends only on the trusted code that changes blocks into output. This trusted code is not affected by the source program.

Does this have a safety concern?

-- Reply to this email directly or view it on GitHub: https://github.com/Raku/problem-solving/issues/438#issuecomment-2366967022 You are receiving this because you were mentioned.

Message ID: @.***>

thoughtstream commented 1 month ago

I’ve been thinking about how we can make the proposed non-whitespace #| and #= more syntactically “regular”...and hopefully much easier to implement as well.

In looking at the current specification, it seems obvious that the main problem is with the trailing #= documentor. Because, even if we restrict where it can appear, people are still going to want to place that construct in syntactically inconsistent places in the grammar:

    class Base {            #= Example 1
        ...
    }

    class Der               #= Example 2
    is Base
    {
        method action (     #= Example 3
            $argie,         #= Example 4
            $bargie         #= Example 5
        )
        {...}
    }

    my Int $answer = 42;    #= Example 6

Conceptually, what developers want is to be able to document a construct “on the left”.

But, syntactically, that sometimes means “immediately after the declarand itself” (as in examples 2 and 5), while at other times it means “before the following component (as in example 4), or “inside the following component” (as in examples 1 and 3), or even "after the entire statement” (as in example 6).

If we want to support all those locations with the new non-whitespace #= documentor, that is going to significantly complicate the entire Raku grammar, and especially the AST construction process.

In contrast, it would be relatively easy to handle the leading #| documentor. Specifically, we could define that a #| is only permitted immediately before a keyword_opt / type_opt / declarand_req sequence:

    #| Example 7
    class Base {
        ...
    }

    #| Example 8
    class Der is Base
    {
        #| Example 9
        method action (

            #| Example 10
            $argie,

            #| Example 11
            $bargie
        )
        {...}
    }

    #| Example 12
    my Int $answer = 42;

That seems to already allow virtually all of the current (sane) uses of #|.

So, in order to make #= equally implementable from a syntactic point of view, and more predictable and teachable for end-users, perhaps we could specify that a #= can only appear immediately after the same keyword_opt / type_opt / declarand_req sequence.

Which would give us:

    class Base          #= Example 13
    {
        ...
    }

    class Der           #= Example 14
    is Base
    {
        method action   #= Example 15
        (
            $argie      #= Example 16
            ,
            $bargie     #= Example 17
        )
        {...}
    }

    my Int $answer      #= Example 18
        = 42;

That’s not ideal perhaps (example 16 is particularly unappealing, and examples 14 and 18 aren’t ideal either) but it would be a great deal more achievable and predictable. And those suboptimal cases could be made a little less awkward assuming we also provide a bracketed #= form:

    class Der  #=[ Example 19 ]  is Base
    {
        method action (
            $argie      #=< Example 20 >,
            $bargie     #=< Example 21 >
        )
        {...}
    }

    my Int $answer  #=「Example 22」  = 42;

Of course, we probably also need to support multiple consecutive documentors in those locations:

    #| This is the base class
    #| for everything in the system

    class Base   #= I<It really needs
                 #=   a much better name
                 #=   of course>
    {...}

The only other element we’ve discussed and which that this approach doesn’t handle is the ability to document non-declarative components, such as the keys of a hash.

Personally, I think we should defer adding that until we see how the proposed changes to documentors for declarands shakes out, but if we did want to tackle it now, I’d suggest that we simply specify that a documentor that doesn’t appear immediately before/after a declarand can only appear immediately before/after a compile-time literal value, to which it is then attached. For example:

    my %preconfig =
        #| Max size of entry
        size => 42,

        #| Apply limiting
        limit => True,

        #| Randomizes lookup
        rand => True,

        #| Et cetera
        etc => 'et cetera';

    my %postconfig =
        'size'  #=[ Max size of entry ]  =>  42,
        'limit' #=[ Apply limiting    ]  =>  True,
        'rand'  #=[ Randomizes lookup ]  =>  True,
        'etc'   #=[ Et cetera         ]  =>  'et cetera';

Note that in the case of a trailing documentor for hash keys, the key has to be explicitly quoted, because it’s now syntactically separated from the subsequent => and hence no longer autoquoted.

patrickbkr commented 1 month ago

Specifically, we could define that a #| is only permitted immediately before a keywordopt / typeopt / declarandreq sequence ... So, in order to make #= equally implementable from a syntactic point of view, and more predictable and teachable for end-users, perhaps we could specify that a #= can only appear immediately after the same keywordopt / typeopt / declarandreq sequence.

I do like the idea to limit the positions the Decks can appear in.

But I think we are stretching the #| / #= syntax too far. They look like comments, but actually share little behavior given they can appear only in very specific places, always in direct relation to a bordering element.

Comparing this latest proposal to the earlier idea to utilize a docs trait, the trailing places they can be put in are (almost?) identical. So why not just go with the trait approach then? I don't think we have leading traits yet. Would that be doable? If yes, we could do:

docs 'Base class for magicians' 
class Magician {
    has Int $.level;
    has Str @.spells;
}

docs 'Fight mechanics ' ~
     'Magicians only, no mortals'
sub duel (
    Magician $a  docs 'The first magician in the duel',
    Magician $b  docs 'The second magician in the duel',
) {
    ...
}

my $mage  docs 'A magician of level 2 or above';

I don't like the

docs 'Fight mechanics ' ~
     'Magicians only, no mortals'

bit. But maybe we can come up with a nicer approach to have multiline strings there.

jubilatious1 commented 1 month ago

Docks?

Disagree with using/implementing this terminology. Sounds too much like Docs.

patrickbkr commented 1 month ago

Docks?

Disagree with using/implementing this terminology. Sounds too much like Docs.

Just a typo on my side. Corrected.

jubilatious1 commented 1 month ago

https://github.com/softmoth/raku-Pod-To-Markdown/issues/18#issue-498423231

bbkr commented 1 month ago

I like and frequently use declarator blocks in my everyday code. They are great way of communicating with developers. Comparing to regular POD:

=begin pod           <- line noise

=head3 foo           <- easy to miss when renaming method, requires informal headX standard to be used across company

Blabla               <- actual, useful content

=end pod             <- line noise

method foo { ... }

BTW: I never liked POD, both in Perl and Raku. When I first encountered Rust approach with

    // - Regular comment
    /// - Generate library docs for the following item.
    //! - Generate library docs for the enclosing item.

I was amazed how easy and consistent documentation can become across entire ecosystem. Here is example of my module using this concept. No line noise, no special documentation syntax clutter, just pure information.

So for me class/method declarator blocks are the way to go and POD can be removed entirely from core and/or moved to some external library.

tbrowder commented 1 month ago

I appreciate your point of view, but I love Raku pod (Rakudoc) and find it easy to use (as opposed to Perl pod).

I speak as a regular Perl user since 1993 and Raku user since 2015. (Note RakuAST is making Rakudoc even better.)

And we have tools to easily convert Rakudoc to other file forms like Markdown, html, and PDF (and other types I don't currently use). Conversely, we also have tools to convert Rakudoc to Markdown, and there are many other non-Raku tools to convert Markdown to some other document form.

Finally, Rakudoc has a much richer, and extensible, syntax which enables almost unlimited enhancement and variety of output PDF and html products.

doomvox commented 6 days ago

I'm late to the party but I wanted to say that in general I like things like declarator docs quite a bit (unlike lizmat and thoughtstream, I'm definitely in favor of embedded documentation). But It does seem that this is an area where an extreme commitment to backwards compatibility probably isn't necessary: changing some corner cases to make the parsing problem saner is probably fine.

Myself, I actually haven't really used the declarator doc features very much, but I think that's largely because they don't really seem like a big part of Raku's programming culture.

In comparison, in the emacs lisp world, a typical function definition might look like:

(defun ourproject-do-something (argument)
  "Do something useful with ARGUMENT for ourproject."
  (message "doing stuff with: %s" argument))

The Raku equivalent would be something like:

#! Do something useful with 'argument' for ourproject
sub do-something ($argument) {
  say "doing stuff with $argument";
}

The elisp docstrings, while technically optional, are required by the programming culture. Emacs has a "help" system that displays these docstrings (similar to the case of the comma IDE that finanalyst was talking about). An elisp code example wouldn't be complete without the docstring, though in the case of Raku they're uncommon.

It's probably significant that in the case of elisp, the syntax makes it seem like the docstring is part of the routine, where the raku version makes it seem like it's external to it: I think something like the "docs" feature thoughtstream proposed seems very interesting, it would make the docstrings seem like they're internal to the routines, and might actually encourage their use.

But that said, while the "docs" syntax might be a good "in addition to" my own feeling is it wouldn't work so well as an "instead of". For good or for ill, we should probably stick with the magic comments in some form.

Raku / problem-solving

Declarator Docs should be limited in scope #438

Update