dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.85k stars 4.62k forks source link

Libraries APIs are fully documented using efficient workflows #44969

Open carlossanlop opened 3 years ago

carlossanlop commented 3 years ago

Today, some of our source code repos like Runtime, WPF, WinForms and WCF, consider the dotnet-api-docs repo the source of truth for their documentation. This poses some challenges:

We would like to propose a series of changes in our documentation process that will simplify the developer's role and automate some of the steps. During .NET 6, we piloted this new documentation process with a subset of the .NET Libraries. That pilot produced the following outputs:

We will continue this plan in .NET 8.


Bring documentation from Docs to triple slash

Substitute all the triple slash comments in source code with the language-reviewed documentation that exists in dotnet-api-docs. We will do this on an assembly by assembly basis, and will enable the MSBuild property <GenerateDocumentationFile> to ensure new public APIs cause a build warning when they don't have documentation.

We will be using the dotnet/api-docs-sync PortToTripleSlash tool for this effort, for which I added the feature to port dotnet-api-docs to triple slash comments: https://github.com/dotnet/api-docs-sync/tree/main/src/PortToTripleSlash

Remarks

We won't backport remarks for the following reasons:

.NET Framework-only APIs

APIs that only exist in .NET Framework will continue having dotnet-api-docs as its source of truth.

APIs that are shared by both .NET Core and .NET Framework will have their source of truth in triple slash comments in .NET Core, making sure we preserve the differences in behavior between versions.

Tasks Here we will list the assemblies that got their documentation backported. To do - Add one item per assembly and link to PRs as they are created. - [X] System.IO.Compression.Brotli https://github.com/dotnet/runtime/pull/46716 - Need to revisit - [X] System.Numerics.Vectors https://github.com/dotnet/runtime/pull/47725 - Need to revisit - [X] System.Formats.Cbor - Need to revisit

Merge blocking label and docs reviewers

We already have a bot task that automatically adds the new-api-needs-documentation label to PRs that are introducing new public APIs, but we want to make sure it also becomes a merge blocker, like the * NO MERGE * label does.

Once the PR has been reviewed by a maintainer, and they confirmed the new APIs have proper documentation, the label can be manually removed to unblock merging.

We also want the bot to automatically add the @dotnet/docs members as PR reviewers for language review.

Tasks - [ ] Make the `new-api-needs-documentation` label mandatory. - [ ] Automatically add language reviewers to PRs adding new APIs. - [ ] Update our readmes to describe the purpose of the label and what to expect from a PR review.

Automatic Docs build

Note: We can only begin this work if we finished backporting the documentation from all assemblies.

Currently, whenever new APIs are added to the source code repos, we send the updated ref assemblies to the Docs team so they feed them to the Docs build system, which causes the regeneration of the dotnet-api-docs xml files, showing the new APIs. After this point, we can then manually port the documentation from triple slash, using https://github.com/dotnet/api-docs-sync/tree/main/src/PortToTripleSlash

From now on, we want to automate the process by automatically merging the ref assembly drop (it's just a commit in an internal repo). This drop will also contain the build-generated IntelliSense xmls, which would now contain the documentation source of truth, removing the step of manual porting.

Tasks - [ ] Automatically ship our generated intellisense packages to customers, instead of the ones we normally would bring from the dotnet-api-docs internal feed. - [ ] Exclude assemblies with backported documentation from the ref assemblies drop. Instead, just include the intellisense xml.

Debt prevention and docs fixes

At the time of writing this document, we have 900+ issues open in the dotnet-api-docs repo. We would like to consider these as part of the regular work planning for our dev teams, and we want to make it easier to filter issues by area by automatically adding labels using a bot, and area owners should be notified (on a subscription basis, like in runtime).

Contributors will still be able to report documentation issues in dotnet-api-docs, but fixes will now be done directly in triple slash comments in source. PRs will be disabled in dotnet-api-docs except for maintainers.

Documentation for APIs that only exist .NET Framework will continue to be done directly in dotnet-api-docs (that will be its source of truth).

Tasks - [ ] Finish documenting APIs introduced in 1.x and 2.x, 3.x, 5.x, 6.x and 7.x ([dotnet/runtime/projects/60](https://github.com/dotnet/runtime/projects/60)) - [x] Add bot task to automatically add area labels to dotnet-api-docs issues. - [x] Update fabric bot that auto-mention people, to be onboarded for dotnet-api-docs, so that only subscribed users can get mentioned in comments of new docs issues. - [ ] Consider Docs for our sprint planning and triaging. cc @jeffhandley - [ ] Update readme with new guidance on debt prevention and docs fixes.

Low pri / Nice to have

The following are tasks that are out of scope for this effort, but we would like to consider in the near future:


Questions and suggestions are welcome.

jkotas commented 3 years ago

Which source is the documentation going to live for APIs that have separate implementations per OS, per runtime or per architecture?

safern commented 3 years ago

Which source is the documentation going to live for APIs that have separate implementations per OS, per runtime or per architecture?

@carlossanlop and I are still discussing this. I will write up a proposal design for that and discuss it with other people to make sure we have a good convention and the right features for people to specify the source of truth that makes it's way to the final ref pack. Once I have that I can loop you as well in the conversations, or just include it in this issue, but it feels like the description is quite large already.

safern commented 3 years ago

We also need to come up with a plan for APIs that live in Private assemblies (System.Private.CoreLib, System.Private.Uri, etc) on how we're going to have their documentation land in the .xml for the ref assembly that expose them, for example, String APIs should be under System.Runtime.xml.

carlossanlop commented 3 years ago

Does it mean contributors will have to open PRs to both the runtime repo and the docs repo

@SingleAccretion

Edit: Seems the original comment I'm replying to got deleted?

carlossanlop commented 3 years ago

@carlossanlop and I are still discussing this.

One proposal we are considering (and it would be nice to hear some opinions on this) is to add documentation to one file, based on a pre-defined priority. The first file we would try to find is the AnyOS file. If not found, we try to find others, like Windows, then Win32, then Unix, etc. Again, we would have to define a priority.

Then, once documentation is added to one file, the rest of them should also have triple slash comments to prevent the build warning, but the text would just be a boilerplate message indicating that is not the main documentation file.

jkotas commented 3 years ago

If not found, we try to find others, like Windows, then Win32, then Unix, etc. Again, we would have to define a priority.

It does not feel right to define priority ordering of OSes. Also, it would lead to odd situations like Unix specific behaviors having to be documented in .Windows.cs file, etc.

Maybe would should have a dummy "AnyOS" or similar file for these cases that would have a throwing implementation and contain the documentation?

carlossanlop commented 3 years ago

it would lead to odd situations like Unix specific behaviors having to be documented in .Windows.cs

Sure, just keep in mind that's how we currently have our documentation in the dotnet-api-docs xmls: We do not have multiple <summary> sections for each OS/architecture/version. We put all our documentation in one place.

We need to keep in mind that we can only send to the Docs build system one build-generated intellisense xml file per ref assembly, from which all the documentation (for all OS/architectures/versions) will be copied and pasted into the ECMAxml files in dotnet-api-docs.

@safern also suggested we could indicate in the csproj an MSBuild property that would indicate which file is the source of truth.

But your idea @jkotas of always having an AnyOS file makes sense if we want to avoid confusion and we want to avoid the MSBuild property.

safern commented 3 years ago

But your idea @jkotas of always having an AnyOS file makes sense if we want to avoid confusion and we want to avoid the MSBuild property.

Wouldn't this be the equivalent of having a super long file with the APIs? Something like a ref assembly?

jkotas commented 3 years ago

I meant that we would follow the existing factoring - one file per type. For example, we would have Path.cs, Path.Unix.cs, Path.Windows.cs and Path.AnyOS.cs. Path.AnyOS.cs would have the docs.

krwq commented 3 years ago

How many of such OS/Arch specific APIs there are? Would it be possible to manually handle that? Also should APIs have generic description without implementation detail? If there are many of such perhaps we could combine the summaries into a single:

On Windows:
<Windows summary>

On Unix:
<Unix summary>
...

maybe that wouldn't be perfect but it would temporarily solve the problem, list of such APIs could be gathered into an issue and later those could be improved by hand.

(and not sure how would that work for arguments but still I'm curious about the numbers first, possibly this problem is very small)

safern commented 3 years ago

I meant that we would follow the existing factoring - one file per type. For example, we would have Path.cs, Path.Unix.cs, Path.Windows.cs and Path.AnyOS.cs. Path.AnyOS.cs would have the docs.

I see. There are 2 things that make me wonder if that's the best approach:

1) We have various projects (OOB packages) that cross compile with AnyOS and for the AnyOS configuration it generates the PNSE platform (the PNSE generator does filter per API if there are APIs included in the Compile item already, so that wouldn't be a problem), but that would mean, that for libraries that don't have an AnyOS configuration, we would need to add one (potentially impacting build times), we would also need to add one to their project references and also, we would need to exclude that asset from the product/package. So it would be a "doc" only build.

2) People that create a OS specific file and want to add docs to APIs that are split in between those OS files, will need to remember to add an OS agnostic build configuration to their project, and an AnyOS file for those types, by having to manually include all APIs of that type there and add docs into that API.

I still need to do an analysis on how many OS specific files we have and the patterns we use on the csproj to think on a better solution, but so far it seems like the "most" reasonable approach yet.

cc: @ericstj any thougths?

safern commented 3 years ago

Another idea that comes to mind, would be injecting a tfm-docs build that is done on every vertical to every project. And then for APIs that are OS Specific, people would have to add a .Docs.cs file that throw PNSE, and then we grab the .xml output from that vertical as the source of truth.

carlossanlop commented 3 years ago

@krwq I haven't yet seen a <summary> that describes a different behavior between operating systems. This description split is usually seen in the exceptions an API throws, and even those are not that common.

Having a <Unix summary> wouldn't work because we still need a <summary> tag, otherwise the build warning will show up when the assembly gets the mandatory documentation MSBuild property enabled, because that required tag would be missing.

Bringing documentation from dotnet-api-docs into triple slash would mean bringing a single text description for each tag (<summary>, <returns>, etc.) that applies for all OS already, but if we were to follow your suggested approach, we would have to manually split each text for each OS, even though in the majority of the cases, the behavior doesn't really change. That is not the issue being discussed here - the issue is deciding in which file we want to paste those backported comments in triple slash.

GSPP commented 3 years ago

Could it be even easier for the general public to submit documentation improvements? Contributing to the GitHub repos is already quite a time investment. People need an account, and maybe they have never done anything like this. Nobody will casually do this.

I sometimes read the documentation on the Microsoft website and think "this really should be explained differently". If there was a button that I could use to submit a documentation fix right there, and that would make it into the documentation eventually, I think a lot of people would do that. This might even become a bit of an addiction to power users such as contributing to Stack Overflow is addictive.

huoyaoyuan commented 3 years ago

Some complain:

For certain simple APIs (namely Math.Max, Math.Sin and so on), their documentation are totally meaningless, and can be much longer than the implementation itself. It's even more meaningless for non-English speakers. Putting them into triple-slash doc in code will downgrade the code reading experience.

carlossanlop commented 3 years ago

their documentation are totally meaningless, and can be much longer than the implementation itself. It's even more meaningless for non-English speakers.

@huoyaoyuan Whether the documentation is meaningless or not, is not the issue we are trying to solve. This issue is about making it easier for contributors to add documentation. People are more familiarized with triple slash comments in C# than with ECMAxmls from dotnet-api-docs.

Putting them into triple-slash doc in code will downgrade the code reading experience.

Why? You can always press some key combinations collapse the comments sections. For example, in Visual Studio, you can press Control+M+O. In VS Code you can press Control+K+0.

huoyaoyuan commented 3 years ago

Why? You can always press some key combinations collapse the comments sections. For example, in Visual Studio, you can press Control+M+O. In VS Code you can press Control+K+0.

Thanks for pointing out this. But, sometimes I need to read the code in GitHub or source.dot.net.

krwq commented 3 years ago

@huoyaoyuan you should file a separate issue on that on https://github.com/dotnet/dotnet-api-docs or https://github.com/dotnet/docs (not sure which is the correct one but they will likely redirect you).

danmoseley commented 3 years ago

@carlossanlop @jeffhandley I wonder whether it would be helpful to clarify what the .NET 6 goal is by changing the title? For example, perhaps the goal in the .NET 6 timeframe is to pilot and prove a mechanism for generating docs from sources, uncovering and resolving any unknowns (eg by applying to Brotli and/or a few others). If so we could mark this story marked P1. Presumably the goal of converting all our docs is not P1 (P1 = "very important to the release")

jeffhandley commented 3 years ago

@danmosemsft @carlossanlop I've updated the title, added outputs to the description, and updated the priority. Thanks for the suggestion, @danmosemsft.

carlossanlop commented 3 years ago

We are ready to ask each area pod to help backport the documentation for each one of the assemblies they own. We will let each pod decide if they want to address the backporting of their documentation in 6.0 or in Future: If your pod cannot get this done by 6.0 for one of your assemblies, MS Docs will continue being the source of truth for that assembly (and it won’t block shipping).

We wrote a tool that automates the process of copying the MS Docs documentation of a whole assembly to your source code repo: You just run the tool targeting an assembly, then submit a PR with the changes.

Detailed instructions for the backporting process and how to use the tool can be found here.

The expectation for each area pod is to submit one PR per assembly, but some assemblies may also require a manual PR for dotnet-api-docs to keep lengthy remarks there as *.md files (to avoid saturating our source code files).

Here are a couple of examples:

sharwell commented 3 years ago

We wrote a tool that automates the process of copying the MS Docs documentation of a whole assembly to your source code repo: You just run the tool targeting an assembly, then submit a PR with the changes.

Let me know if there are any formatting or other discrepancies between the actual and desired output from this tool. I've written a bunch of code fixes for manipulating XML documentation comments in source code.

eiriktsarpalis commented 3 years ago

Contributors will still be able to report documentation issues in dotnet-api-docs, but fixes will now be done directly in triple slash comments in source. PRs will be disabled in dotnet-api-docs except for maintainers.

Presumably though there are aspects of documentation that cannot be backported to triple-slash comments? I'm thinking about examples and code snippets but surely there are others too.

carlossanlop commented 3 years ago

@eiriktsarpalis they could potentially be ported as markdown, but we are trying to encourage people to avoid that. For those cases, the suggestion is to move the whole remarks section to *.md files in dotnet-api-docs and link them as INCLUDE files in the remarks. That way, we don't get code examples backported to triple slash.

Instructions: https://github.com/carlossanlop/DocsPortingTool/blob/master/BackportInstructions.md

carlossanlop commented 3 years ago

For APIs that have multiple OS/platform implementations, this is what we decided to do:

cc @jkotas

ViktorHofer commented 3 years ago

@carlossanlop I'm really about this change of direction and proposal. Just a few things that I want to mention that should be watched out for when implementing the underlying infrastructure:

sharwell commented 3 years ago

Generating xml docs as part of the compiler invocation which happens in the inner build (i.e. per OS) will likely degrade performance

Omitting documentation comments (and the subsequent reduction of language accuracy) is not a scenario I would expect users to follow in order to reach performance targets. This would be equivalent to disabling the use of Nullable Reference Types because it impacts build performance. Alternatives include:

  1. Disable documentation comments only for a select subset of builds where the output artifacts will not be used. The same optimization applies for nullable reference types.
  2. Optimize the compiler itself to improve output performance for these files.

The first option is the one we're using in dotnet/roslyn, and it's also the one we're looking to use for improving inner loop iteration times for builds invoked directly by the Run Tests command inside the IDE.

RussKie commented 3 years ago

Seems I'm a little late to the party, didn't know this discussion existed. Thank you for doing this 👍

Files with code snippets will remain in the dotnet-api-docs repo, untouched for now. When we backport remarks containing links to those code snippets, the links will be relative to the dotnet-api-docs repo.

This concerns me a little, in the target state I'd like code samples to live with the code as well.

AFAIK back in the days docs were often placed in external files and then linked by xml-doc like so, and perhaps we should continue using this approach for samples.

<include file='doc\Control.uex' path='docs/doc[@for="Control.GetScaledBounds"]/*' />

Perhaps we can build samples as part of build workflow on a special agent 🤔

danmoseley commented 3 years ago

is not a scenario I would expect users to follow in order to reach performance targets.

Even for the F5 scenario? As I recall, the F5 build already takes some shortcuts to be faster. It has to be a system of "rings". We would not argue eg that Prefast should run on F5. The question is just which goes in which ring.

safern commented 3 years ago

Comparing the generated doc files and binplacing he right one is File I/O which per library won't be measurable but in total might add to the build times. Consider comparing files with a decent algorithm (might require an extra msbuild task).

Yeah, we were thinking of an extra MSBuild task to compare the files. This would only be validation to make sure that we are producing the right .xml file that will ship side by side to the ref regardless of the OS compilation. This step of course will only run as part of the all configurations build which is where we build for all OSs.

Consider that in future we will collapse reference and source projects and if possible account for that in the infrastructure.

I don't expect that to change significantly this infrastructure but is a good point to consider.

gewarren commented 3 years ago

I didn't know about the <include> tag. @carlossanlop we could use that for lengthy remarks and keep the included XML files in the dotnet/runtime repo?

ViktorHofer commented 3 years ago

Yeah, we were thinking of an extra MSBuild task to compare the files. This would only be validation to make sure that we are producing the right .xml file that will ship side by side to the ref regardless of the OS compilation.

Sounds good.

This step of course will only run as part of the all configurations build which is where we build for all OSs.

It would also need to run as part of an individual library's build which builds all configurations by default and on which you can do a dotnet pack to generate the package.

safern commented 3 years ago

It would also need to run as part of an individual library's build which builds all configurations by default and on which you can do a dotnet pack to generate the package.

Yeah, good point.

carlossanlop commented 3 years ago

in the target state I'd like code samples to live with the code as well. Perhaps we can build samples as part of build workflow on a special agent

@RussKie We have discussed the possibility of bringing the code samples to the source code repo, and validate them by building them alongside the unit tests, but the problem is that sending them to MS Docs becomes quite complex, so we decided to move this feature to the Future.

Meanwhile, area owners are free to decide if they want to keep lengthy remarks, or even whole code snippets (instead of includes), in their triple slash comments. They also have the alternative of moving them to *.md files in dotnet-api-docs (I described that here).

<include file='doc\Control.uex' path='docs/doc[@for="Control.GetScaledBounds"]/*' />

Thanks for the suggestion, @RussKie!

I didn't know about the <include> tag.

Me neither, @gewarren . Let me do some tests locally to see if the intellisense xml files will publish those <include> elements. If they do, then we can open a thread with the Docs dev team to ask them if mdoc is capable of handling those, and if not, then ask for a feature request to handle them.

iSazonov commented 3 years ago

From my experience in PowerShell repo processing docs in dev workflow is annoying and slow down dev process. Obviously .Net team will have to add more CI checks for docs (to check syntax, gramma, cross links, external links and so on). This will lead to many negative consequences. If only a line of code has been changed in a PR, then it makes no sense to rebuild all the documentation. And vice versa, if one typo in the documentation has been fixed, there is no point in compiling the entire code. It is already clear that additional files are required to place the examples. Other additional files may be needed to support new features.

Currently we have async doc process in PowerShell. If we created a PR in the PowerShell repo and we need to update a documentation, then we add a link to the new issue / PR in the PowerShell-Docs repo. After the PR in the PowerShell repo is merged we can continue in the PowerShell-Docs repo. This gives freedom to both developers and doc writers and simplifies the process.

MSFT PowerShell team has great plans for improve the process in the year. .Net team could look the experience. They plan to enhance PlatyPS convertor to support new doc schema and markdig so that convert source md file to traditional xml format. After updating all infrastructure they want exclude old xml files at all. While this is different from .Net, it has a lot in common. The approach of MSFT PowerShell team opens up opportunities for introducing new features in the documentation. I would seriously consider making the .Net process asynchronous, placing the documentation in separate files (.csdoc ?) that would allow converting them to any format (traditional XML, HTML, Jupiter notebook, etc.). If .Net team has decided to do this laborious work, then why not convert the documentation into something more convenient and extensible?

KalleOlaviNiemitalo commented 3 years ago

Let me do some tests locally to see if the intellisense xml files will publish those <include> elements.

The XML documentation files output by the C# compiler will contain elements from the included files, rather than <include> elements, so the included files will not be needed in further processing.

Except perhaps if there is a dangling xref: link and you want the docs build to post a GitHub Checks report back to dotnet/runtime, with a warning that references the original line in the included file… but that would be difficult in any case, as long as you're building the HTML documentation from the compiler-generated XML, which does not refer back to the source lines. I imagine the compiler could be changed to generate some XML processing instructions akin to #line if it were important.

RussKie commented 3 years ago

Thank you @iSazonov for the feedback.

This is a certainly complex issue - if we place xml-docs near the code than we'd have run 2 build pipelines, but if we don't we likely degrade the developer experience, and hence less likely get quality docs.

We can probably be smart about the pipelines, and with some heuristics decide whether to run one of the other, or both by checking git-diff for lines that start with triple-slash. The same mechanism can be used to apply necessary labels on PRs as well.

I can see an issue with running docs outside PRs as well. What if the markup is broken? How long does it take to detect it? Who's going to fix it?

MSDN-WhiteKnight commented 3 years ago

But if documentation is in triple slash comments, could the docs build be separated from code build? To my understanding, if we enable GenerateDocumentationFile property, the compiler would build both the assembly and documentation, validating that cref's in triple-slash actually bind to symbols available in current scope. Additional build task can be created to validate markdown in remarks, but the C# compilation still would have to run on every change in summary, returns etc., because only compiler can produce the list of valid symbols in given context.

RussKie commented 3 years ago

I was more alluding to runnable samples that require additional verifications, e.g. like https://github.com/dotnet/docs-desktop/pull/173.

iSazonov commented 3 years ago

Thank you @iSazonov for the feedback.

This is a certainly complex issue - if we place xml-docs near the code than we'd have run 2 build pipelines, but if we don't we likely degrade the developer experience, and hence less likely get quality docs.

I already have an opened PR where MSFT reviewer ask we update XML docs. A problem is that English is not my native language and there was three(!) iteration with three(!) men before we get acceptable result. (I'm not even sure that every English man is capable of writing good documentation that fully complies with the repo standards.)

The only way to get good docs is to delegate updating the documentation to a professional writer and do it asynchronously.

carlossanlop commented 3 years ago

Update:

We want to emphasize how important it is for us to ensure this effort does not become a ship blocker at any point. One of the main concerns was having a fully working "source of truth hybrid mode": having intellisense.xml files generated only for some assemblies, and use them as the source of truth, while other assemblies kept using dotnet-api-docs xmls as their source of truth.

Yesterday, we tested this and we are happy to confirm the hybrid mode works:

@safern created a ref assemblies drop with only two intellisense.xml files: System.IO.Compression.Brotli and System.Numerics.Vectors. @gewarren created a new test branch in the Docs build system and fed it with the assemblies drop. The output showed that all assemblies kept using their existing docs from dotnet-api-docs xmls, while only those two assemblies updated their docs using the contents from the passed intellisense.xml files.

cc @jeffhandley @ericstj @danmoseley

safern commented 3 years ago

Another thing to keep in mind here for the hybrid mode for the release if we end up using that mode to avoid blocking a release if not all docs are ported, is to think about the intellisense xml files that we ship with the product are correct and up to date. For example:

We have System.Numerics.Vectors using triple slash comments that means the intellisense xml file is produced from its buid; but we have others like System.Runtime that their source of truth is the docs repo and we update the docs for an API in System.Runtime.xml, then we need to produce a new intellisense package with the System.Runtime.xml updated, consume that in dotnet/runtime and ship that updated System.Runtime.xml as part of the ref pack.

So we need to be careful in this scenario that System.Numerics.Vectors.xml in the ref pack is the one produced from its build but others like System.Runtime.xml in my example that haven't been ported, are shipping the xml file coming from the docs repo and have the latest content.

jeffhandley commented 2 years ago

Update: I've moved this to Future and we are going to close all of the issues for backporting api docs to triple-slash comments for now. As we concluded early in the .NET 7.0 release cycle, we need to invest more into the DocsPortingTool to set this effort up for success. When we're able to revisit this, we will open new issues per area again.