[Proposal] Simplify snippet inclusion from external files

supernova-eng commented 5 years ago

Today, when writing an article on docs.microsoft.com and needing to embed some code snippets that are coming from another repository, the writer needs to go through a convoluted process of updating the repository configuration and then using relative references to the code sample (see example of current dependent repository chain).

This is a time-consuming process that is error-prone (the user can accidentally mis-configure the repo) and has the potential of affecting other content (if the configuration was incorrectly updated). To reduce the friction in adding new code samples, we are proposing a new convention, by which embedding code is much more efficient and intuitive.

Baseline Experience Goals

The experience for referencing code from an external repository should be:

Simple. The writer does not need to mess with any configuration settings or relative paths. If they want to reference code from an existing repository, they can do that just by knowing where the code resides.
Reliable. The new convention should account for the fact that in certain scenarios, the writers might reference it in an incorrect way, or the code will change without the content owner ever realizing it. Validation steps should be put in place to ensure that the included code is properly showing in the documentation.
Flexible. Code samples should support a level of flexibility in improving the inner content based on the context in which the user is in, with the help of tokens that are contextually modified. This is extremely useful for scenarios where pieces of code need to be substituted for an entity that comes from the user context (e.g. API key, URL to blob).

For each code snippet that is inserted into docs.microsoft.com, we need to account for several additional capabilities:

Link to source. The user that visits the documentation page needs to make sure they can view the source code for the full sample (in cases where it’s imported from an external resource).
Language switcher. Some samples are made available in more than one programming language. For example, a code snippet can be implemented in both Visual Basic .NET and C#, and the consuming user should be able to switch between the two, when necessary.
Interactive capability. We should be able to give users the capability to try out the specific fragment referenced in docs from an interactive shell, where available (.NET and Cloud Shell at the time of this writing).
Easy copy. Allow the user to copy the code snippet they are looking at.

Authoring Experience

Referencing code should be done through a new extension, designed to match the experience we’ve put together for Markdown extensions on docs.microsoft.com – using the triple-colon demarcation (:::). This is the convention defined in the Markdig documentation for custom containers.

The extension name should be code.

The user can include plain-text code within the extension, without having to import code from a file.

Parameter	Required	Description
`lang`	⛔ No	Determines the programming language short-code that will be used to colorize the code sample. If no language is specified, no colorization will be applied to the imported code snippet. If no language is specified, and the src property is pointing to a code file, the language colorizer will be inferred from the extension of the code file. If language is specified, and the src property is specified, the language defined in lang overrides the inferred language logic. Example: `lang=”csharp”`
`src`	⛔ No	Determines the location from which the code is imported. This property can be a relative path to the code (in case it lives within the repository, or is coming from a dependent repository) or a fully-qualified URL to a code file outside docs.microsoft.com (e.g. on GitHub). The URL location will be restricted to GitHub and Azure Repos – if a different domain is used, the build should fail. Example: `src=” https://raw.githubusercontent.com/MicrosoftDocs/DenDeli-Test/master/dendelinet/docfx.json”`
`region`	⛔ No	Determines the region from within the code file, that needs to be brought into the documentation page. The name of the region follows the convention outlined in the DocFX documentation[1]. Only one region can be specified per single code include. If the range property is specified, the region property is ignored. Example: `region=”myAwesomeCodeSample”`
`highlight`	⛔ No	Lines [BW1] that need to be highlighted in the rendered documentation. Can be a single line, range or a set of ranges. Example: highlight=”1-5,9,12-4”
`range`	⛔ No	Determines the range of lines that needs to be brought into the documentation from the referenced file (defined in src). This attribute cannot be used without the src specified. If multiple ranges are present, the content will be concatenated when rendered on docs.microsoft.com. Example: `range=”1-5,9,12-4”`
`disableLinkToSource`	⛔ No	In certain scenarios (e.g. when code is imported from a dependent repository), there is no direct way we can use to generate the full link to source code. This option would act as an override to the default behavior of always linking to source code when the code snippet is included from an external file. Example: `disableLinkToSource=”true”`
`interactive`	⛔ No	Determines whether interactivity is enabled for this sample. This is dependent on the docs.microsoft.com platform capabilities. A C# sample or a PowerShell sample can be enabled with Try.NET and Azure Cloud Shell respectively, however a Python sample does not support interactivity. When a sample that does not support interactivity is enabled with this attribute, a build warning will be generated, and no interactivity will be added. Example: `interactive=”false”`
`title`	⛔ No	Title that is shown on the page along with the example. Example: `title="Something awesome going on"

Following the conventions above, a fully defined code entry in a Markdown file might look like the following:

# My awesome document

Here is a document that outlines how to do something.

::: code src=”https://raw.githubusercontent.com/MicrosoftDocs/DenDeli-Test/master/dendelinet/docfx.json” range=”1-5” highlight=”1” :::

And this example is a prime case of why you need to do things this way and not any other way.

To present two code snippets that belong to different languages, and are interchangeable, the code blocks need to be wrapped in a codeset element, as such:

# My awesome document

Here is a document that outlines how to do something.

::: codeset
    ::: code src=”https://raw.githubusercontent.com/MicrosoftDocs/DenDeli-Test/master/dendelinet/docfx.json” range=”1-5” highlight=”1” :::
    ::: code src=”https://raw.githubusercontent.com/MicrosoftDocs/DenDeli-Test/master/dendelinet/docfx.xml” range=”1-5” highlight=”1” :::
::: codeset-end

And this example is a prime case of why you need to do things this way and not any other way.

This will ensure that both code snippets are treated as the same, just implemented in a different language, rather than showing two separate code samples.

A codeset element does not have any attributes that the user can set. If a codeset element only has one code snippet presented, the language switcher should not be displayed, and the code snippet is treated as a standalone code entity.

Status

[x] Review initial proposal.
[ ] Determine whether we need to use triple back-tick or the Markdig extension.
[ ] Determine path for Gist.
[ ] Determine the need to support lines, given their fragility.

Duncanma commented 5 years ago

Have we given much thought to the impact of this on the build pipeline (@herohua )? The system is flying along, building markdown files and then boom it hits one of these and has to go fetch/download/sync an external repo. Or is the thought that it would just go fetch that one file from GitHub vs. pulling down the repo?

supernova-eng commented 5 years ago

@DuncanmaMSFT the goal is to avoid repo clones altogether - we just need that one raw file. Granted, there likely would be a perf impact if we are pulling a 100MB file, but that would be a huge deviation from the goal of the code snippet feature.

Duncanma commented 5 years ago

Cool, and are we assuming that file is available publically without auth, or is that external to the scope of the feature?

supernova-eng commented 5 years ago

@DuncanmaMSFT starting with public code with no auth. Protected snippets, to our scenario, are an edge case - we aim to ensure that all sample code integrated in docs is public.

mmacy commented 5 years ago

With the current requirement of the config file edit, we have some governance over whether to enable pulling in code, and from where. This feature reduces the friction by ditching the config file edit requirement, but might be considered riskier security-wise as anyone can pull any code in from wherever and have it published.

If I'm Evil Actor and I want Trusting Public to copy my nefarious code and run it, I could slip it in to a doc from my own repository and have it published.

michael-hawker commented 5 years ago

The @mmacy the other concern to is what happens if the repo code changes later after the reference is applied and the doc is republished and just takes the new changes. It should probably have to reference a specific commit to maintain consistency so the document has to be updated to match changes to the source still?

michael-hawker commented 5 years ago

We should also make sure the custom syntax doesn't cause common markdown parsers to fail so that documentation can still be read in a markdown reader/editor. (E.g. Visual Studio Code, GitHub, Windows Community Toolkit Markdown Control, etc...)

mmacy commented 5 years ago

@michael-hawker Specifying a commit would be a nice-to-have, but one of the primary benefits of remote code inclusion using regions (as opposed to line numbers or a specific commit) is so that the code can be updated in an out-of-band fashion (such as to accommodate to a new library version) without also having to update the article.

ahmadawais commented 5 years ago

Is there any specific reason for not treating ::: similar to ``` i.e. the ending ::: also come at a new line? That's the only thing that bugs me about this.

fbartho commented 5 years ago

How about an alternative:

```docfx-remote-snippet
src="https://raw.githubusercontent.com/MicrosoftDocs/DenDeli-Test/master/dendelinet/docfx.json"
lang="json"
range="1-5" highlight="1"



Benefits:
- This would work regardless of if it's on one line or not
- Simplified parsing, you could even make the contents of the DocFX-remote-snippet tag be "valid JSON" if you want to.
- More tools could treat it similarly?

yufeih commented 5 years ago

@DuncanmaMSFT , it does have some impact on the build pipeline, downloading just the url makes the impact manageable. For authentication, it can go through the same pipeline we use to clone git and download other files.

yufeih commented 5 years ago

Thanks to @fbartho and @KalleOlaviNiemitalo, force every property on a new line, replace = with :, boom, it becomes a valid yaml, codeset can be separated by ---

```yaml-code-snippet
src: "https://raw.githubusercontent.com/MicrosoftDocs/DenDeli-Test/master/dendelinet/docfx.json"
lang: json
range: "1-5"
highlight: 1
---
src="https://raw.githubusercontent.com/MicrosoftDocs/DenDeli-Test/master/dendelinet/docfx.yml"
lang: yml
range: "1-5"
highlight: 1

Now the problem left is how to group them into codeset, this is similar to tabbed content, we use link as the key, the following example groups a and b into tab 1 and c into tab 2, maybe the same method can be applied to group code snippet using a codeset attribute. @dendeli-msft , thoughts?

fbartho commented 5 years ago

An array of repeated hashes for code snippets embedded in triple-backticks would work for a code sets too!

KalleOlaviNiemitalo commented 5 years ago

it becomes a valid yaml

If you replace the equals signs with colons, too.

KalleOlaviNiemitalo commented 5 years ago

If you put triple tildes or quadruple backticks around the Markdown example in the comment, you don't have to break the triple backtick with a space. https://spec.commonmark.org/0.28/#example-92

superyyrrzz commented 5 years ago

dotnet/try uses following syntax:

# My Interactive Document:

```cs --source-file ./myApp/Program.cs --project ./myApp/myApp.csproj



However, this syntax renders some wield empty blocks under normal Markdown preview. See: https://github.com/dotnet/try/blob/master/docs/test_example/lesson.md

I'd vote for the more informative YAML syntax.

mvelosop commented 5 years ago

I love the idea to include regions or line ranges! better yet if something like .../... or whatever is included "automatically" between discontinuous ranges.

I'd like to be able to include a commit in the reference, to get alerted/notified when the base code has changed, so you could verify if the related text is still relevant for the code (or viceversa).

This is useful for guides like the Microservices e-book

supernova-eng commented 5 years ago

Lots of great feedback, folks - thank you! I will address each point separately.

@mmacy regarding your comment:

This feature reduces the friction by ditching the config file edit requirement, but might be considered riskier security-wise as anyone can pull any code in from wherever and have it published.

If I'm Evil Actor and I want Trusting Public to copy my nefarious code and run it, I could slip it in to a doc from my own repository and have it published.

It's a fair assertion, and it will be another guideline that we will need to have in place to import code. For clarity's sake, I should mention that on docs.microsoft.com, we should have an allow-list of domains from which the code can be imported. github.com is fair game. contoso-logistics.com is not. This is probably not applicable to the DocFX tooling as a whole, in its open incarnation, but it will be needed on docs.

@michael-hawker yes, referencing by commit should be the behavior if you want to preserve the specific point-in-time snapshot of your code. For example, one can just use a link similar to the following:

https://raw.githubusercontent.com/MicrosoftDocs/azure-iot-docs-sdk-typescript/d441ca05735d6e2f6879e5ae6cb934a09a4c4d69/LICENSE-CODE

Fair point about editability and previews. Because this is a Markdown extension, this is, by-design, non-standard Markdown that is going to be hard to render in "vanilla" scenarios, such as GitHub or VSCode. That said, this is something for us to consider to integrate in the Docs Authoring Pack.

@ahmadawais good point - it should just work on a new line (see containers spec). I am using that for brevity.

@fbartho @KalleOlaviNiemitalo interesting idea, my hesitation there being that we would have a third way of doing things in DocFX, which is generally not recommended. Specifically, we already have a number of Markdown extensions implemented that use the ::: syntax that does not rely on parsing of entities between triple-ticks. What are your thoughts here?

@mvelosop that's a good one - indication of "space" between ranges might be useful for customers. I like that. Can you elaborate more on the "commit in the reference" point?

fbartho commented 5 years ago

I have roughly 0 experience in this repository / documentation system, I've only consumed the docs in a browser. So please take my suggestion with a grain of salt! I joined this thread because of a random twitter discussion that I saw that linked here.

However, @dendeli-msft, I do have experience with using several markdown previews simultaneously on a same project.

For example: seeing the preview in my editor, seeing the preview in one of several dedicated markdown previewers, and seeing the preview in GitHub-markdown (literally have been dealing with this all week for several different projects).

Without knowing about that prior-art for ::: syntax, I suggested the triplebackticks+language feature as an easy way to provide "reasonable" fallback behavior for markdown renderers that don't support this syntax. It's common that one of them won't support it, and since we're chatting here, currently GitHub won't support this proposed syntax.

My personal objection would be withdrawn if GitHub ended up handling these cleaner & extended the Markdown spec to support that flavor.

GitHub already has a few other extensions
GitHub is part of Microsoft, so that's less unlikely than if any others proposed this
Many markdown editors maintain a "Github Flavored Markdown" mode, so if this got added, I could foresee other previewers/renderers supporting it.

jonsequitur commented 5 years ago

@superyyrrzz

dotnet/try uses following syntax:

# My Interactive Document:

```cs --source-file ./myApp/Program.cs --project ./myApp/myApp.csproj


However, this syntax renders some wield empty blocks under normal Markdown preview. See: https://github.com/dotnet/try/blob/master/docs/test_example/lesson.md

This empty code fences are for the local workshop / content authoring mode, where the code is pulled in from a backing project while the dotnet try tool is running locally. We're working on publishing support for other formats such as self-contained Markdown, interactive HTML, and WebAssembly, to support serving content on the web: https://github.com/dotnet/try/issues/214

supernova-eng commented 5 years ago

@fbartho this is great feedback, we really appreciate it.

What if the extension would effectively follow a mix of what @superyyrrzz provided, along with the current convention?

Something like:

    ```csharp src="https://foo.bar/test.cs" range="1-2,6-53"



@jonsequitur - what do you think of the above?

fbartho commented 5 years ago

That seems fine to me! I don't specifically have any strong opinions about how the configuration flags are passed. If you support multiline mode, then you could get a semi-documented fallback:

```csharp
src="https://foo.bar/test.cs" range="1-2,6-53"
```

Looks like:

src="https://foo.bar/test.cs" range="1-2,6-53"

In vanilla renderers, which might be a good fallback!

supernova-eng commented 5 years ago

@fbartho the only drawback I can think of with multi-line is that it is easier to confuse that with an actual code snippet. For example, because GitHub does not support the custom functionality that we have, it rendered it as the snippet you've shown above. Me, a reader that has no idea what's going on behind the scenes, might think that you are providing me a literal code piece.

jonsequitur commented 5 years ago

We considered line ranges in Try .NET and decided against it because they're not stable.

In C#, the #region keyword is supported by Roslyn so has first-class support. But we're looking at different conventions for other languages. (F# work is in progress and there too, we're using delimiters within the source code.)

supernova-eng commented 5 years ago

@jonsequitur for DocFX, we got the region problem solved in-house: Tag Name Representation in Code Snippet Source File.

Fair concern with line numbers. That's why the guideline for us would be that if anyone uses line numbers and doesn't want the content to change if the code changes, they can either rely on regions or reference the file from a specific Git commit.

jonsequitur commented 5 years ago

I would worry authors might not immediately understand the risk of using the line ranges and maybe the capability carries an inherent risk of fragility that's better avoided from the outset.

supernova-eng commented 5 years ago

@jonsequitur that's fair feedback - have you encountered cases where scenarios that required lines were not well-addressed with regions?

jonsequitur commented 5 years ago

Regions work great for C#, because they're part of the compiler's semantic model. Going through the compiler enables subtleties like the indentation being adjusted when code is rendered into the editor, and diagnostic squiggles are then positioned correctly within the reformatted code. This is not easy to do if you're treating the code as simple text. Semantics also mean verification can be smarter. Try .NET knows the difference, for example, between an error that the user typed versus an error that the content author typed in the hidden portion of the code.

Comments should also work for most languages. Another convention we've considered is to reference code snippets by type and method name. This is more limiting. Both of these favor semantics over coordinates.

On a separate note, in most cases, interactivity needs additional parameters, and the parameters needed will differ from language to language. Some can be supplied inline with the snippet but others, to avoid having to maintain duplicate copies of the content differing by only a parameter or two, should be able to be supplied at a different level than the content. Examples might include runtimes, language versions, and package dependency versions.

mvelosop commented 5 years ago

@mvelosop that's a good one - indication of "space" between ranges might be useful for customers. I like that. Can you elaborate more on the "commit in the reference" point?

Hi @dendeli-msft. it looks like there are at least two use cases regarding region/line ranges:

It's OK to track the region if the code changes and
The code must look stable in the documentation

Most of what I do in docs is for number 2, because there's some text explanation related to the code shown (for context, related to eShopOnContainers and the Microservices Architecture Guide).

That can be solved by referencing the code by commit, as you point out above. So far so good.

However...

At some point, someone else updates the code (e.g. upgrade from .NET Core 2.2 to 3.0) and even though the guide would still show the original code, it'd not be "current" and the related text might also need to be updated.

So, it'd be great to have some sort of warning that the referenced code file has been changed, so someone can take a look and decide whether to update the commit reference, because no change is needed or rewrite a whole section because whatever doesn't hold true any more.

Since this is the "day-dreaming" phase 😉 it'd also be great to have:

Some sort of "referenced code changed" report a la "security warning"
A "light bulb alert" in the web interface
A "squiggle" in the VS Code extension

Thanks!

dotnet / docfx