dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.93k stars 4.02k forks source link

Multilingual XML Document Comment #3371

Open h82258652 opened 9 years ago

h82258652 commented 9 years ago

If we turn on the VS output XML document option in the project, we can get a XML file. We can package it with the assembly into a nuget package and publish it to everyone. Lots of people from all over the world use .Net to build their applications. I think this feature it is very necessary.

How about add an attribute for every XML document comment element? Like this:

/// <summary>
/// Say Hello to someone. 《-----------output to ${AssemblyName}.xml file.
/// </summary>
/// <summary lang="en-US">
/// Say Hello to somebody. 《-----------output to ${AssemblyName}.en-US.xml file.
/// </summary>
/// <summary lang="zh-CN">
/// 你好.(This is Chinese.) 《----------output to ${AssemblyName}.zh-CN.xml file.
/// </summary>
/// <summary lang="ja-JP">
/// こんにちは(This is Japanese) 《------------output to ${AssemblyName}.ja-JP.xml file.
/// </summary>
/// <param name="name">Person's name.</param> 《---------------output to ${AssemblyName}.xml file.
public static void Hello(string name)
{
    Console.WriteLine($"Hello {name}!");
}

After complie, we can get: ${AssemblyName}.dll (or exe) ${AssemblyName}.xml ${AssemblyName}.en-US.xml ${AssemblyName}.zh-CN.xml ${AssemblyName}.ja-JP.xml

And then, the VS intellisense can use the most suitable xml document comment.

devkanro commented 9 years ago

That‘s great. I think we need this feature. We can add Multilingual Comments for everyone.

ufcpp commented 9 years ago

:+1:

HaloFour commented 9 years ago

Similar CodePlex discussion

I think that it would be a good thing to have internationalization support for XML documentation. I have two concerns.

First given that XML documentation lacks a root node there isn't really a good way to have the same language grouped together. Perhaps a new root node can be added which would carry the language attribute?

Second is that this can represent a lot of extra text within the source files. The latter issue can be resolved through IDE settings/extensions that could automatically collapse or reformat the XML documentation.

h82258652 commented 9 years ago

@HaloFour Yes, especially the first one. But if we add a root element like this:

<comment>  《----------------------- output to ${AssemblyName}.xml
    <summary>
    </summary>
</comment>
<comment lang="en-US"> 《-------------output to ${AssemblyName}.en-US.xml
    <summary>
    </summary>
</comment>

It will be not compatible with current. So I think add a root element it is not a good way.

About the second. Maybe sometime the comments will be longer than the source code, even the comments are both collapsed. That's what we don't want. Through IDE to resolve this is a good way. I agree with you.

HaloFour commented 9 years ago

@h82258652

Probably true, although having multiple sets of tags might pose a similar problem. I'm sure that violates the existing schema and it may be questionable if existing tools support that.

Maybe add new nodes that would only apply to the translations?

<summary>What is this method about?</summary> 《--------------- output to ${AssemblyName}.xml
<returns>Some value</returns>
<translation xml:lang="en-CA">  《----------------------- output to ${AssemblyName}.en-CA.xml
    <summary>What's this method aboot?</summary>
    <returns>Some value, eh?</returns>
</translation>

The schema would need to be amended (either way) but existing tooling can probably just skip those unrecognized nodes.

h82258652 commented 9 years ago

@HaloFour Wonderful solution. It should be compatible with existing tool. I try your solution in VS2015 RC. Although the translation node will output to the xml file, we get no error and no warning, everything works like before. I love it.

I also come up another solution —— Add a link node.

<summary><link href="resource.resx" key="resourceKey" /></summary>

When it complie, the link node will find current project's resource files and get the string to replace it. It is effective to solve the second problem. But it takes too much work to the existing project.

I think your solution is the best solution what we can get now. Hope the .Net team can make it come true.

gafter commented 9 years ago

@billchi-ms You might be interested in this.

whoisj commented 9 years ago

:+1:

matthiasburger commented 7 years ago

Idea of @HaloFour is nice, but implementing translation into an existing tool is a lot of work. I think the solution with

<summary>
</summary>
<summary lang="zh-CN">
</summary>

is less work. multiple summary-nodes are supported in most libraries. A root-node for summary would also be a lot of effort for existing tools. Imho, idea of @h82258652 and the support for a <link>-tag would be great both. Would be very luxurious to have both of them.

iam3yal commented 7 years ago

Another option is to do what we always did and not change it so at the at the source level, we write the comments as usual like so:

    /// <summary>Prints Hello World.</summary>
    public static void Hello(string name)
    {
        Console.WriteLine($"Hello {name}!");
    }

Now, the idea here is to use resource files for localization and the text of the comments themselves as the key for the resource files.

Finally, we need to have some attribute that specifies the resource file for each given language so something like this:

AssemblyXmlCommentsLocalizationeFile("resource.zh-CN.resx");
AssemblyXmlCommentsLocalizationeFile("resource.en-GB.resx");

The only disadvantage that I see with this approach is that when comments change the key inside the resource files need to change too and this might introduce some complexities.

pr-yemibedu commented 7 years ago

Hello, I think that the key into the resource file has better merits: 1) if you run a tool against the code, you can get a "key not found" warning. 2) it simplifies the origin source to promote updating comments. you may have two or three summaries out of sync on their own and nothing telling you how they differ. 3) it can promote other tooling to work on the code comments in the same way.

Now having a summary link element key and a resource key are push and pull perspectives. Depending on the tool, you can get a "no resource found with key" or a "no source file found for containing key" warnings. Neither is bad nor is either bullet proof. I like the latter because from source control, you can leave yourself adding translations that don't affect code. That is a sensible depiction of how I would want my commits to be organized. Thank you. Good day.

ZSkycat commented 7 years ago

Multilingual XML Document Comment is a good idea. But writing many comments in a code file is not a good thing. This will greatly affect the readability of your code. I have a better idea of allowing multilingual XML document annotations to be written in another file.

多语言xml文档注释是个好想法。 但是将许多注释写在代码文件里不是一个好事。 这会非常影响代码的可读性。 我有一个更好的想法,允许将多语言xml文档注释写在另外的文件里。

like this:

File: Hello.cs

namespace HelloSpace
{
    class HelloClass
    {
        /// <summary>
        /// Say Hello to someone.
        /// </summary>
        public static void Hello(string name)
        {
            Console.WriteLine($"Hello {name}!");
        }
    }
}

File: Hello.csdoc

/// <multilingual lang="en-US">
/// <summary>
/// Say Hello to somebody.
/// </summary>
/// </multilingual>
///
/// <multilingual lang="zh-CN">
/// <summary>
/// 你好.
/// </summary>
/// </multilingual>
///
/// <multilingual lang="ja-JP">
/// <summary>
/// こんにちは
/// </summary>
/// </multilingual>
HelloSpace.HelloClass.Hello(string name)
iam3yal commented 7 years ago

@ZSkycat It's nice and all in theory but the issue with is that you will have to duplicate code all over the place.

sharwell commented 7 years ago

I believe this is a bad idea primarily because it doesn't provide assistance to manage "translation rot". This problem is worse than the more commonly seen "comment rot" because most users will fundamentally not be able to understand the problem in most cases where it occurs.

A better solution would have the following characteristics:

  1. Comments in source are written in a reference language.
  2. Translations are provided separately, where each translation provides the original source text for the translation and the translated text.
  3. During the build, several things occur:
    • Valid translations are used (these are translations which exist and the source text has not changed since the translation occurred)
    • A build output is made available which contains strings that need to be translated for each language. This contains the most recent source/translation for each item (where available), as well as the new source.

In the past, I wrote MSBuild tooling which produced the output for sending to translators in *.resx form. For a file Resources.resx, the build would produce obj/Debug/MissingTranslations/fr-FR/Resources.2017-09-07.en-US.resx. This file was added to source control as OriginalTranslations/fr-FR/Resources.2017-09-07.en-US.resx on the day/build we sent the file for translation. When the translated result was returned, we added it as OriginalTranslations/fr-FR/Resources.2017-09-07.fr-FR.resx. The build automatically picked up pairs of files in date order to find valid translations for all current strings. Manual pruning/merging of files for translation was completely eliminated, and the entire process decoupled individual languages the date(s) when translations were sent and received.

pr-yemibedu commented 7 years ago

Hello, @sharwell Your last paragraph is confusing for those not familiar with your tooling (like me). Is it something the public can access? What do you mean by "translators" you were providing resx files to consume?

What was the outcome of this being used in Visual Studio or another editor? If a developer hovers over a method, do they only see the ref language or does something enable it to pull some of all of the translated comments?

Did any of this give a warning or error for missing source points or for unused translations?

Just trying to figure out how well this plays with the existing practices of developers and tooling. One concern is whether there are other new cases that can lead to unmaintained "rot" of documentation. If you have real or contrived examples to demonstrate how source code will look, that would be cool. Thank you. Good day.

sharwell commented 7 years ago

Is it something the public can access?

Right now, unfortunately no. I have a meeting today to discuss this but I have no real control over their decision to release it.

What do you mean by "translators" you were providing resx files to consume?

It was a 3rd-party translation service. We sent files off site for translation, and they returned translated results.

What was the outcome of this being used in Visual Studio or another editor? If a developer hovers over a method, do they only see the ref language or does something enable it to pull some of all of the translated comments?

Inside Visual Studio, developers working on the library saw comments in the reference language. Our goals were different than this bug report (we were translating .resx and .vsct); I would expect if/when the feature is modified to handle documentation comments then consumers of the library would see comments in their own language (if supported/provided).

Did any of this give a warning or error for missing source points or for unused translations?

No, but it is quite easy to support both during the build and as an analyzer in the IDE.

We didn't see unused translations as a problem.

If you have real or contrived examples to demonstrate how source code will look, that would be cool.

For developers working on the project, it would look as though comments were written in the reference language alone. The files related to translations lived fully outside the main code base.

For end users, everything looked like we carefully maintained translations as part of the development process.

For us (and here I'm speaking for a former employer, not Microsoft), it was a great balance because the maintenance overhead of "dealing with translations" dropped to nearly zero. Past solutions required scripting before sending files to the translation service and for merging the results returned to us. The scripts didn't perfectly handle cases where the original source string changed between sending the files and having them returned.

pr-yemibedu commented 7 years ago

Hello, So I guess at least as of now, an actual code example (not explanation of it) is not possible. It sounds like a nice production process for users of a library. Many who use Git would agree about not worrying about space as in saving storage. The worry with left over cruft is in the thinking of how often is one person doing something were a "{@gray} vs {@grey}" ref key translation is getting dropped or duplicated.

I was initially under the impression the OP was focused on code comments that peers would be changing and reading instead of the comments for the library that users would be consuming. I think now my understanding favors the latter. Not a fan of the third party handling, so hopefully there are ways to get an in house system in place to get similar results. Still good concepts and maybe putting certain comments directly in code was never the right thing to do in development. Thank you. Good day.

filippobottega commented 6 years ago

Hello, what do you think about this solution?

http://www.surviveplus.net/en/archives/39 https://www.nuget.org/packages/Surviveplus.XmlCommentLocalization/

Regards, Filippo.

whoisj commented 6 years ago

@filippobottega can the xml:lang="{lang} be appropriated from a separate file?

Having the localization contained in non-source code files is highly desirable. The reason for this is that localization specialists are very often not software developers (vice-versa being true as well), and ought not need to pass through the same barriers as source code changes make.

For example: waiting for review from development, completing a CI task, etc. before submitting.

filippobottega commented 6 years ago

I know, but the problem is that from a developer point of view, the documentation is needed near to the code as much as possible. When I document a class, a property or a method, I need to write the code and the comments in the same time. I need the possibility to insert Italian and English translations for 2 reasons: first of all to remember what I'm developing (Italian), then to publish my comments to other developers (English because I don't know if my library will be used in other countries). In this scenario I don't need to send code comments to a localization service. Anyway if you need to use a localization service I think that the compiled XML, and not the source code, is the right file to send. Indeed I don't like that more than 2 languages are embedded in the source code comments.

pr-yemibedu commented 6 years ago

Hello, From a developer point of view, I develop in my native language, and have the primary comments in that language too. That is because I am writing small internal applications where my whole team can read and understand what I have provided. For making a solution to be handled by a larger group (international) or available in source to the public, the benefit of having reference points to external document sources is very valuable. I rarely edit my source code in Notepad or nano. most of the time, I have VSCode, Emacs, NetBeans or Eclipse to work with to handle tooling to pull in want external references are needed for intelligent development. If I had a second window or tab with all my comment points and only a small redirect marker in my actual source, I would probably be happier anyway. It is the reason we want our editors to support collapsing comment regions too. Out of sight but available for lookup. the point by @whoisj is probably the kind of simple things that would be very helpful divide the time a developer has to focus on all that none development typing upfront. Thank you. Good day.

whoisj commented 6 years ago

Documentation comments are very different than regular code comments. Documentation comments are used to generate API documentation metadata which is packed into the result assembly. While they can be used as in-line comments, that is not their purpose for being. Supporting the widest number of consumers of a given assembly is generally a good thing, I cannot imagine why anyone would want to limit the number of languages present - except for the obvious: "we don't have anyone to translate to {language} we cannot localize for it".

sharwell commented 6 years ago

Multiple languages within the same source file would be absolutely impossible to maintain, at least under the currently-available tooling. No one would be able to look at the code and determine if the comments are correct, and no one would be able to create new comments or edit existing comments.

I'm not happy with the XLF tooling added to this repository, but at least it has the ability to track state for the naturally-asynchronous localization process.

sharwell commented 6 years ago

@nguerrera Is there existing (public) tooling in place for localizing API documentation from comments in code?

nguerrera commented 6 years ago

@sharwell I'm not aware of what's available for that. @mairaw, do you know?

It would not be too hard to add it to xliff-tasks. However, I don't think we would use it because localization of docs.microsoft.com is handled completely separately from the process that localizes product strings.

mairaw commented 6 years ago

I don't think there is. For .NET, since IntelliSense is not generated from the code, the localization happens in the content side. For products like ASP.NET Core which are completely auto-generated from code comments, I still think the comments go through a docs pipeline and then localized. So there is no tool for localizing in code.

Adding @dend (our PM for API reference experience on docs) and @nokura (our localization content PM) here to see if they have something to add.

GF-Huang commented 4 years ago

So what's the progress?

sharwell commented 4 years ago

@GF-Huang I am not aware of anyone actively working in this space. All teams that I know perform API localization do so by creating the en-US XML files during the build and then submitting those XML files for localization.

GF-Huang commented 4 years ago

@GF-Huang I am not aware of anyone actively working in this space. All teams that I know perform API localization do so by creating the en-US XML files during the build and then submitting those XML files for localization.

Could you tell me the detailed steps? Thank you.

sharwell commented 4 years ago

@GF-Huang there isn't a single approach for everyone. It's something each team develops and/or coordinates with the company or team responsible for the translations themselves. My personal favorite approach is the one I reference above (https://github.com/dotnet/roslyn/issues/3371#issuecomment-327782676), but unfortunately it is proprietary and the company decided to not make it public. Most of the teams I participate in today use XLIFF-based translation services with dotnet/xliff-tasks, but it doesn't appear to support XML documentation files directly yet. If you are interested in using XLIFF for this, you could file a feature request on that repository.

GF-Huang commented 4 years ago

@GF-Huang there isn't a single approach for everyone. It's something each team develops and/or coordinates with the company or team responsible for the translations themselves. My personal favorite approach is the one I reference above (#3371 (comment)), but unfortunately it is proprietary and the company decided to not make it public. Most of the teams I participate in today use XLIFF-based translation services with dotnet/xliff-tasks, but it doesn't appear to support XML documentation files directly yet. If you are interested in using XLIFF for this, you could file a feature request on that repository.

OK, however, thanks your detailed reply.

newbe36524 commented 3 years ago

I am searching for some solution too. I want to publish my nuget package with multiple localization xml documents which could help developer to use my libary better with support from IDE IntelliSense. Or, is there any solution to make IntelliSense better with localized languagne.

cristianosuzuki77 commented 3 years ago

Tagging @JasonCard and myself @cristianosuzuki77 in case there is a chance/need to discuss what the loc tooling would need to accommodate for this feature.

newbe36524 commented 3 years ago

I am searching for some solution too. I want to publish my nuget package with multiple localization xml documents which could help developer to use my libary better with support from IDE IntelliSense. Or, is there any solution to make IntelliSense better with localized languagne.

Hi, I am back here to sharing my solution about my case. I hope it will help you.

I am working on my open source project named "Newbe.ObjectVisitor". I need to localize my xml document to help my consumer to use. I make it ok as steps below:

  1. Change xml document output location to git-tracing directory. As code in my project:
<PropertyGroup>
    <DocumentationFile>$(SolutionDir)/Newbe.ObjectVisitor/Newbe.ObjectVisitor.XmlDocuments/Newbe.ObjectVisitor.xml</DocumentationFile>
</PropertyGroup>
  1. Since it is a open source project, I can get free translation service from crowdin.com. It make a great integration with github. So I translate my xml document by a few steps in crowdin.com. You can make it done with other service or just translate file by youself.

My crowdin project listed here :

https://crowdin.com/project/newbeobjectvisitor

  1. As translation done, files of localization about xml document is stored in git-tracing directory too. It will be as structure below: https://github.com/newbe36524/Newbe.ObjectVisitor/tree/develop/src/Newbe.ObjectVisitor/Newbe.ObjectVisitor.XmlDocuments

  2. Change csproj to pack all xml documents into nuget package. Since my open source project contians multiple target frameworks, I write a small powershell scipt to generte LocalizationXml.props for all target framework and localization languane. Then import LocalizationXml.props to csproj file of my project. as below:

    
    <Project Sdk="Microsoft.NET.Sdk">
    
    <Import Project="LocalizationXml.props"/>
    <PropertyGroup>
    <DocumentationFile>$(SolutionDir)/Newbe.ObjectVisitor/Newbe.ObjectVisitor.XmlDocuments/Newbe.ObjectVisitor.xml</DocumentationFile>
    </PropertyGroup>


The structure of LocalizationXml.props will be as showed below:

```xml
<Project>
    <ItemGroup>
        <Content Include="$(SolutionDir)/Newbe.ObjectVisitor/Newbe.ObjectVisitor.XmlDocuments/zh-CN/Newbe.ObjectVisitor.xml" Link="Localization/net461/zh-CN/Newbe.ObjectVisitor.xml" Pack="true" PackagePath="lib/net461/zh-CN/Newbe.ObjectVisitor.xml">
            <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
        </Content>
        ... more lines like above
    </ItemGroup>
</Project>

You can check out the scipt from my repository.

  1. Run dotnet pack and publish as you like.

  2. If consumer of your package change the display lanugage of IDE, he will get my localized documents in Intelisence.

That`s all.

Links about my investigation:

https://docs.microsoft.com/en-us/nuget/create-packages/creating-localized-packages

Many thanks to @EventHorizon1024 help.