lastcoolnameleft commented 6 years ago

Currently, prerequisites/results/validations are parsed from previous header blocks in the markdown. (e.g. '# Prerequisite', '# Results', etc.)

This has a some disadvantages:

separating into different lines quite difficult; whereas putting them into a code block preserves the line breaks (depending on the implementation)
Headers have syntactic meaning for subsequent sections

The proposal is to migrate all SimDem relevant data into Code blocks.

This would change how we define a prerequisite from:

Prerequisite

Prereq-1.md Prereq-2.md

To:

Prereq-1.md
Prereq-2.md

And changing how we define a result from:

Result

{ ‘foo’: ‘bar’ }

To:

Prereq-1.md
Prereq-2.md

This has the following advantages:

heading text now no longer carries syntactic meaning—that is, SimDem no longer needs to look at heading text to figure out what to do with the ensuing section
all syntactic meaning is present in the “language” of the code block, which means that prerequisites or commands can be anywhere in the document, and can still be correctly parsed. This also makes the renderer implementation much cleaner.
Code complexity and SLOC is significantly reduced, especially when paired with a markdown AST parser like Mistletoe
- E.g. Parsing of Markdown can be simplified down to: https://gist.github.com/miyuchina/a06bd90d91b70be0906266760547da62

P.S. Thanks to @miyuchina for guidance and wording on this.

SorraTheOrc commented 6 years ago

The goal of this proposal is good, however the implementation is problematic.

SimDem is designed to make documentation executable. These changes break this design in two ways:

The acceptable one first, requiring that code blocks have "flags" in the backticks line (e.g. "```results" as shown in the example above is something document authors will not know to do. It is not standard Markdown. This causes two problems...

1) document authors are used to providing a language flag at this position, in fact many markdown editors provide helpers for these flags. This makes it possible to nicely render the content, e.g.

echo "this code block is marked as 'bash'"

By replacing the language string with a SimDem specific keyword we break this important documentation feature.

2) Currently it is possible to mark up results sections with an "expected similarity" to cater for tests that do not produce identical results, which is the vast majority of real world tests. Ironically this implemented using the string value in the backticks and so brings with it problem 1. If we choose to use this approach then I'm hoping we can continue to use this approach (i.e. encode the expected similarity in the same string - is this possible with the proposal. See the test suite for an example.

3) Using a code block to define prerequisites prevents the author from providing textual descriptions of what the pre-req is for and makes the documentation unreadable to a human. See the documnetationa and test suite for examples of why this is important.

All that said, I've always felt that pattern matching on headers, as well as the limited test results specification, is brittle. In addition it requires authors to know the right syntax. Therefore, I could live with the reuse of the language string as long as we found a way to represent the "expected similarity" in results sections.

However, the breaking of documentation features in the prerequisites section is a deal breaker for me. The proposed approach for pre-requisites needs rethinking. Unfortunately, I don't have an alternative to offer right now.

lastcoolnameleft commented 6 years ago

TL;DR:
I agree with #1, but am uncertain about impact.
I believe #2 is a reason for keeping everything in code block type, but expand the syntax to support other use cases.
I don’t think #3 should be an issue with the right implementation.

You brought up a few great points, that I've had in the back of my head, but this is forcing me to flesh them out better.

Answering your points:

Markdown rendering - You’re correct, it would impact the rendering visualization and you wouldn’t get bash formatted syntax. I’m color blind, so I never even notice them. If that’s an issue for others, then I’ll concede on that point.
“expected_similarity” - Using the code block type caught my eye for doing the rest that way. To support more complicated use cases, what if we allowed something like a function signature? e.g. ‘’’results(expected_similarity:.85,format:json). This would allow us to do more flexible implementations.
IMO, this allows the prerequisites to be even more readable. e.g.

Do these first

You want to make sure you have the Azure CLI installed (prerequisite)

../azure_cli.md

Then make sure you have an AKS cluster: (prerequisite)

../aks.md

Verify your cluster is up. NOT a prerequisite

kubectl get nodes

Install Istio. (prerequisite)

#Do Istio install

Ultimately, we need to decide upon the contract of a SimDem document. Whether we implement according to this PR (e.g. using code block type) or something more semantic.

Here’s what we need to consider when authoring SimDem documents:

Audience - I believe we both agree that it should be writable by non-technical people. Learning Markdown is pretty trivial. They will likely need to learn 3 tick's is how you run code anyway, and IMO, adding the code block flag would be a minor addition.
Syntax - We will need to write clear and concise documentation on how to create a SimDem document from scratch. IMO, easier to understand documentation leads to better adoption. If we go the route of this PR, then our documentation would be something akin to: “We only care about code blocks. Each code block needs a code block type. Here’s a table of each accepted code block type and it’s functionality” Instead of listing each rule for parsing. (E.g. A perquisite must have a header with the word “# Presquisite” and then the lists not be in any other block, etc. Validations must be after X and be in a code block, etc.)
Future planning - My biggest reason for wanting to put everything into code blocks is to future-proof it. If we use context clues (e.g. H1 heading with string “prerequisite” then next block should be files which are prerequisites), then adding new functionality gets complex as we start adding new primitives. If parsing is only done on the code block type, then adding new functionality is easy and stays easy as it grows.

P.S. The unfortunate part is that code blocks won’t render HTML links, so you lose the ability to follow links in a browser. Not sure if that’s a deal breaker too.

miyuchina commented 6 years ago

Author of mistletoe here. To be sure, parsing prerequisites strictly in the original way is certainly doable (in a non-brittle way), albeit slightly more complicated. I could help write a solution so that backwards compatibility is preserved, if that would be easier to see the pros and cons.

Addressing @rgardler's points:

If we decide to go with the original approach, would we just be looking for hyperlinks to documents in the parsing process? Or any filename in the form something.md would be matched? My main qualm with the original approach (correct me if I'm wrong) is that it is not immediately apparent which portions of the prerequisites section are parsed as prerequisite documents (which is why I proposed the code fence approach.) If there's a clear definition of what count as documents, it shouldn't be hard to write up some code for such a definition.
expected_similarity is preserved in token.language. It might just be a weird attribute name, but the information can definitely be preserved.
I actually quite like the natural-language feel to the original approach, so I think it might be worth the complexity to parse the prerequisite section, without resorting to code blocks.

Regarding the third point @lastcoolnameleft brought up, I think we can implement it such a way as to make it extensible in the future. I wouldn't want to burden the user with an implementation detail.

SorraTheOrc commented 6 years ago

I agree with the observations regarding brittleness of the current solution. This needs to be improved. I'm just not sure the code block approach improves it in a way that retains the documentation goal. The prerequisites are not executable code unless we consider Simdem a programming language (which I think we can agree it is not). Therefore, IMHO, it's a misuse of that markdown syntax and is not what documentation authors are likely to do. A list of files in a code block just looks strange in documentation. Add to that the loss of pretty rendering (for the non-colour blind) and I think this is a no-go.

@miaojiang Currently we parse for any http(s):// link and append README.md if no filename is provided. This, of course, means that we can't have intro text with links to other items.

@lastcoolnameleft your documentation argument and future proofing is a good one. However, this is not the only SimDem specific content with special processing rules. The other is the "Validation" section. This section is executed before any of the code in the main document, if all the tests pass in this section then the main document need not be run. This was implement to:

a) make prerequisites run only once (some are long running) b) provide faster execution when tutorial code is reused c) allow a demo to be prepped by running long setup steps ahead of time (there is a "prep" command that will run all prereqs in a doc but stop execution at the main doc).

The Validation section is treated the same as all other sections (just code blocks) but is handled differently during execution. This would need to be documented too, so we have a breaking case in documentation anyway.

Regardless, in my opinion, a documentation writer will default to writing "first you must do x as described in foo", so all they need to learn is that if they put this in a special section it will be automatically run. In my opinion this is more natural than documentation that says "if you put a list of URIs in a code block in this section then they will be executed as prerequisites".

I'm very happy with the proposals for my objection number 2 (especially as it allows us to add more flexible result matching in the future).

miaojiang commented 6 years ago

@rgardler you might have cc'd the wrong miaojiang :)

lastcoolnameleft commented 6 years ago

Thank you @rgardler and @miyuchina for the input.

This discussion has helped me see that while I might be down the wrong path for the SimDem Markdown structure, I'm on the right path for the greater SimDem design. More specifically, the ability to provide multiple (potentially custom) markdown parsers that output a common SimDem Execution Object will allow for differing implementations to work. This will allow us to experiment with new designs and proposals.

I've spent some time on documentation which will hopefully help explain my design and how to develop and extend SimDem2: https://github.com/Azure/simdem/tree/simdem2/docs

I've implemented a basic SimDem v1 style parser here: https://github.com/Azure/simdem/blob/simdem2/simdem/parser/context.py

And the CodeBlock Language style (my proposal above) here: https://github.com/Azure/simdem/blob/simdem2/simdem/parser/codeblock.py

My request is for someone (or some team) to solidify the syntax for the "Default SimDem Syntax" and update the these 2 files to reflect:

NOTE: Personally, I would prefer to continue working on the SimDem engine and not the specific Markdown implementation. So, I'm happy to let someone who is more vested in the semantics propose that.

If this seems like a reasonable proposal, should we close this issue and create a new one to address the request above?

SorraTheOrc commented 6 years ago

Most of this stuff is already fully documented, though it's not in a very obvious place. See the demo-scripts/simdem directory. The reason it is in here is that it forms the default content for Simdem (run simdem with no command line options and you are taken through an interactive set of documentation).

SimDem is designed to be self documenting through this content. I think you'll find it is very extensive. I've aimed to cover 100% of the features, though I make no promises to be successful ;-)

SimDem is also designed to be self testing. See demo-scripts/test for that content.

I've expanded your initial document above and added links in PR #87. However, in your branch we should move that content into the more obvious 'docs' directory and avoid duplicating content. For this reason I have not added any content to the syntax example, as we already have that in the above files (e.g. my existing file that most closely relates to your Syntax example is https://github.com/Azure/simdem/tree/master/demo_scripts/simdem/syntax)

I note that your current approach is to experiment with implementing both the proposed code-block method and the existing contextual method. That's a good idea, lets' see which sticks. However, I think that it's important that we keep the documentation separate so as not to confuse.

If you are good with the above documentation patch we can certainly close this issue.

lastcoolnameleft commented 6 years ago

@miyuchina Thanks for the offer. I implemented it here; however, I'm sure there's a more optimal solution, if you want to take a shot at it.

Code: https://github.com/Azure/simdem/blob/simdem2/simdem/parser/context.py

Test: https://github.com/Azure/simdem/blob/simdem2/tests/test_parser_context.py

lastcoolnameleft commented 6 years ago

@rgardler I'm in the process of migrating the documentation over to the docs directory and trying to make it more clear.

It might be a bit before the "Next Steps/Choose your own adventure mode" feature is implemented, but that's forthcoming.

Azure / simdem

SimDem2 Design Consideration: Modifying Markdown structure #85

Prerequisite

Result

Do these first