Closed Sunshine40 closed 10 months ago
There’ an existing go package I’d like to import to my program, and (unsurprisingly) the doc is written in English. Although I myself can work with English documentation without great difficulty, some members of the team can’t and need translation. So what can I do?
One way is to create my own version of documentation separately, guiding people to use the package. But this kind of documentation lacks the interactive catalog in a pkgsite manner, as well as IDE support in the form of a hint bubble.
To get these functionalities, one would need to edit the code comments in the local cached version of the package source, which sounds like a nasty thing to do.
There’s golang-china/golangdoc which modified the x/tools/godoc package to provide a switch to the Chinese version of documentation.
However, this only applies to the modified godoc http frontend interface, and the workflow of translating docs for this golang-china/golangdoc to use is poorly automated.
As a result, the repo (as well as the repo containing translations) has not received a commit for 4 years.
Well, key point is, I really hate the design of golangdoc, as it uses the modified version of package source code as alternative source of godoc.
This is just slightly different from
editing the code comments in the local cached version of the package source
only difference is making a copy prior to editing, and afterwards making the source feed of godoc switchable.
While it may fit the need of hosting a mirror site of godoc.org with translations, the design is counterproductive, since it defines the translation workflow as modifying the original package codebase and committing the modified version to the “golangdoc” repo, not the respective repo of the original package.
By doing this, golangdoc is not eliminating language barriers for the co-development of open-source projects, on the contrary it made copies of repos that aren’t properly linked to the original version, making the “Chinese (or whatever language) mirror” isolated from the mainstream.
it should be backward compatible, so no “placing all language versions of documentations in the comment section, tagged, for the go doc cmd to switch between” implementation.
Alternative language versions of doc (lang packs), in the form of separated files, should initially be auto generated with a command, to contain placeholders waiting to be substituted. There should be some basic version control logic to mark translations as “should be updated” when a change to the original comment section occurs.
Lang packs should exist in respective packages, not a separate feed (the latter could be supported though). Since most go packages are open source, contributors can easily find their way to doing the translation job, and making it available for all users of the package, instead of just a bunch of users of a mirror site hosting translated modifications.
The newly added language option shouldn’t break thing up to the extent of returning “no available doc for your language option” frequently. Instead, go doc should fall back to the current procedure of fetching the original comments from the package source code when no text is matched for the specific language option.
As a result of 4, a language pack doesn’t need to be of 100% coverage to work, nor does it need to be fully up to date. go fmt
should be able to notify developers that there’s part of doc translation (updating) job left, and visualizers such as pkgsite would show the original language version of the doc when a translation is absent. If a translation is outdated, the existing outdated translation and the version of original text it corresponds to could be shown alongside the current version of documentation in its original language.
When cmd tools are used to generate lang pack templates (with placeholders and original text for reference), it should be made possible to use a secondary language as the translation source. For example, if I wrote go doc comments in Chinese, and then created the English doc lang pack before distribution, it makes no sense to a Russian developer (supposedly doesn’t understand Chinese) who wants to localize the documentation by auto-generating files using Chinese version of documentation as original text reference.
The above functionalities should be enough for localizing go doc for imported packages, given that users don’t often view documentations by reading code comments, but rather by browsing visualized web pages and/or reading popped-up IDE hints, plus the package source code shouldn’t be modified, and users don’t want to change documentation of an imported package. But for developing the package itself, things can be different. It would be nice to be able to “swap lang packs in and out” of the code comment sections, and in a read-writable way (would writeback to lang pack when switching back language options)
It's not clear this is a dupe. #21666 refers to x/website documentation. This appears to be more fundamental - enable go doc
to support multiple languages for package documentation.
This would benefit from a concrete proposal addressing the various challenges.
When I've worked with professional translators to localize software in the past, their typical workflow has required providing them the source text to translate in some specific existing file format like the Gettext .po
format, which they can then load into tools that allow them to be most productive. For example, these tools allow more easily handling incremental updates in future revisions of the software without revisiting everything, and grouping related messages together to help use consistent terms when translating the same concept in multiple places.
However, that has tended to be UI text rather than documentation. I'm not sure if a translator who specializes in prose-style technical writing rather than UI text would have a similar requirement. Do you have experiences with that, @Sunshine40?
I mention this only because it might help decide what is a good format to use either as the final storage format for translated text or at least for a separate translation-workflow helper program to be able to produce and consume.
Separately: it sounds like the initial focus here would be on supporting localized docs in godoc
and the pkg.go.dev
site for general reference and in gopls
to allow delivering localized docs to code tips in text editors.
Is there a need (or even just a desire) for any other tooling to be able to consume localized docs?
I assume that each additional thing which needs to be able to consume localized docs will add some more constraints on possible solutions, such as which component(s) would be responsible for finding, loading, and parsing separate localized docs files.
However, that has tended to be UI text rather than documentation. I'm not sure if a translator who specializes in prose-style technical writing rather than UI text would have a similar requirement. Do you have experiences with that, @Sunshine40?
@apparentlymart The answer to the last question is no, I don't have the exact experiences, but having worked on projects with i18n support does give a hint.
When I've worked with professional translators to localize software in the past, their typical workflow has required providing them the source text to translate in some specific existing file format like the Gettext .po format, which they can then load into tools that allow them to be most productive.
What we did was in an even more primitive way. The dev team defined where the translated text fields should locate, created spreadsheet files listing those text fields, and provided screenshots of the original UI alongside the sheets. The professional translators then populated the sheets, and the dev team manually inserted the translated text back to an equivalent of the Gettext .po file.
Technically our dev team was doing more “baby-sitting” than in your description, but I don’t see a real difference in the principles of a reasonable workflow, namely:
For the 1st point, there might be an addition as we arranged the translations to different languages of a single UI in the same sheet, which proved helpful since the original text is in Chinese. Under this arrangement, the first translator translated the Chinese text to English, then other translators did their job mostly based on the English text (they can reference the Chinese version or translations to other languages as they wish).
Regarding the 2nd point, actually using screenshots gives more vivid images than simply grouping related messages together, only downside is that it hinders the automation of the workflow. (It’s typically not practical for a non-developer to build the source code in order to see what the UI looks like, unless a version is actually released with the UI implemented without translation)
As for the difference in document/UI translation, I think the requirements would mostly be the same. The key point is that the translator might (in most cases) not be the original author, and he would need the above-mentioned information to do this job.
Is there a need (or even just a desire) for any other tooling to be able to consume localized docs?
If I read it correctly, this is a matter about API and, well I mostly want localized docs to be consumed in a way compatible with how existing docs are consumed, with an optional language flag/parameter.
In addition to the language option itself, other things that are important for the doc consumer API would be:
To provide the actual language the returned text is in, in case of a fallback (e.g., returning zh-CN version for lang=zh-TW, returning en version for lang=fr/ru, and returning the original text for miscellaneous language options when none of the specified language option, a default fallback option, and a preferred fallback option like en is available)
To tell whether a piece of doc localization might be outdated, with optional fields returning the piece of doc in the source language whose version corresponds to the available localization, as well as the current version of the respective source text to refer to.
Now that I’ve come to think of it, the quickest way to get go doc localization working, in the form of a third-party plugin, might be focused in:
Enabling a developer to swap doc localization defined in specially formatted separate files into the .go source code files, as if the go doc comments were originally written in that language.
When he swaps out the doc localization by swapping in another localization (e.g., the original one), the edits he made to the comments should reflect in a change of the doc localization file, so that they would not be lost.
A version tracker to recognize doc localization version changes caused by such edits, and to manage the version difference between different language localization. The workflow should be able to fit in the git workflow (I’m not familiar with other VCS usage in open-source projects).
An option to place a piece of go doc comment localized in another language alongside, for reference during the initial translation progress, as well as when making incremental updates mentioned in the 2nd part of the doc consumer API topic.
Such implementation would certainly have no effect on go doc
/ gopls
behavior before a developer manually swaps a doc localization into the source code, and the online version of pkg.go.dev
would only show the doc localized corresponding to the codebase in its tagged version.
Nevertheless, it would be enough if you’re ok with viewing localized doc through a local instance of pkgsite, viewing/editing localized doc in code comments and get localized IDE hints, after a manual swap doc language command on the repository.
The merit of this implementation is that it makes absolutely no change to godoc
/ gopls
/ pkgsite
modules.
It would be easier for a developer to start working on it without the need to thoroughly understand how those modules work in detail.
This would benefit from a concrete proposal addressing the various challenges.
@mpx I've listed some of my thoughts above. Any advice or further question is appreciated.
However I still hope language localization support would be integrated in godoc
/ gopls
/ pkgsite
modules, enabling the native support for doc language shifting on pkg.go.dev
and IDE hint support without altering the imported package codebase.
Ideally all doc localizations should be treated the same way, with no difference between the “original localization” and the “translated localizations” (one might set a preferred language option for reference though, and a localization file might specify another language as a default reference base / fallback option).
After all, when it comes to a worldwide-contributed open-source project, it’s not reasonable to force a single language as the “official language” for documentation. Such idea is the initial inspiration of this proposal – it’s not about “asking for an option to make Chinese (or another language) the official language for the go doc of a package”.
(Warning: this comment is more or less complaining about the status quo, you can skip it if you're not interested in the non-tech details)
For example, I’m a native Chinese speaker, maybe I’d hesitate between choosing Chinese / English as the doc language, but Japanese? I shouldn’t care about whether a package has Japanese documentation, as it isn’t MY preferred language, nor is it the default language (English) in exchanging ideas internationally.
A Japanese developer would think the same way as I do, albeit changing “Chinese” for “Japanese”.
Same goes for doc translation for existing packages. When googling documentation for go packages, I could find a portal hosting the Chinese translation for the Go StdLib documentation collection. I can get the image of “Chinese go developers matters” “packages ought to come with Chinese doc translations to be widely used by Chinese developers” from my point of view, but what about others?
If someone published a package with Chinese doc only, then hardly anyone who doesn’t speak Chinese would bother to import that package.
I, as a matter of fact, would never import a package with only Japanese doc published on pkg.go.dev
, unless someone introduce to me what the package can do and why there’re no other substitutes available, which is not likely to happen.
Truth is, if you don’t provide English documentation, users of your package would be filtered by the doc language you choose, with rare exceptions. And the repository would get limited feedback/contribution as a result.
There are situations where English as the only doc language option is simply not suitable.
I’m current working on a strategy(card/board) game simulator, an implementation of which is the famous game, mahjong (in its Japanese variation).
Given the fact that I have already been struggling through terminology translation catalogs, in order to keep the naming of Types/Methods and Variables from being flooded with pinyin/romaji, what would happen if I restrict my self to use English in the “official” documentation?
Supposedly, the users of the package would firstly be Chinese speakers (my friends / mahjong enthusiasts in China who I can advertise my package to), they would prefer Chinese>English=Japanese as doc language.
Secondly, mahjong pro players from Japan, who actively seek tools to do research on the game, might find the package (if I add a sentence to doc overview briefly introducing the package in both English and Japanese). They’d prefer Japanese>English=Chinese.
Finally, there’re other mahjong enthusiasts who don’t speak Chinese or Japanese. Of course, Chinese/Japanese doc is rendered useless to them, but would an English documentation provided by me be adequate?
Mahjong is a game originated from China, and got popular in Japan. There’re lots of people writing/translating books on this topic in both countries. In doing this, the terminologies in Chinese/Japanese got widely accepted.
The English world, on the other hand, hasn’t made the terminologies standardized when it comes to mahjong. I often come across a concept which has 2 or 3 English translations, as well as people referring to it with romaji directly. So how am I supposed to know which expression is most widely accepted?
So much for complaining, but the point is clear. Providing English doc instead a Chinese one, in this specific case, would make the most users who speak Chinese inconvenient/confused, bring myself more trouble disturbing code development to think of a proper expression to write the English doc, do no good to the potential Japanese users, and provide low quality translation to the English-speaking community. And mahjong is probably not a unique case facing this problem.
This is where community contribution should kick-in IMO, with me providing Chinese doc as the first choice explaining everything I can explain, and English doc as an alternate option, waiting for contributor refinement.
But in current go package publishing workflow, the package probably wouldn’t even reach the English-speaking community without providing the package doc in English via pkg.go.dev
in the first place – people are just used to being separated by language barriers.
There's not an obvious technical answer to this. Supporting docs in multiple languages in a systematic way is significant complexity for really quite small benefit in general.
In this specific case, where you expect to have many readers of different languages without a shared language, it seems fine to write both in the comment:
// English comment.
//
// Chinese comment.
If that became widespread then it might make sense to do something more systematic, but it seems like if that was going to happen, it would have happened already.
Also, if LLMs are good for anything at all, it should be language translation. I wonder whether it makes sense to have a third-party tool that is like 'go doc' but translates the output to a new language.
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
Based on the discussion above, this proposal seems like a likely decline. — rsc for the proposal review group
How is this a likely decline? There has been no discussion at all! I am disappointed by the attitude displayed here.
In some countries, there are multiple official languages, sometimes even three or four and for some projects there needs to be written documentation in all of them, usually plus English as well. Just putting the documentation with all languages together makes it very hard to read.
The simplest solution for this problem would be to support something like //go:lang:[language code] tags in go doc documents, and then filter and display documentation based on the user's language env LANG settings.
An other simple solution would be to allow for every foo.go file a foo.[language code].godoc file with the doc translations and signature only functions.
Please consider more seriously the language related problems many Go users are facing.
The simplest solution for this problem would be to support something like //go:lang:[language code] tags in go doc documents, and then filter and display documentation based on the user's language env LANG settings.
An other simple solution would be to allow for every foo.go file a foo.[language code].godoc file with the doc translations and signature only functions.
This sounds simple but then we have to maintain all that infrastructure, think about which languages to translate our doc comments into, and so on, for the lifetime of the Go project. It adds significant complexity, and Go is maintained by a relatively small team.
So the problem is a lack of resources if the Go team then? That's too bad but somewhat understandable. People in my or the OP's situation will have to deal with it ourselves then and make our own tools.
Think of this scenario:
I’m working on a go package, which is intended to be open-source. But the original dev team consists of native Chinese speakers only. Meanwhile, I hope the package can be used by developers around the globe, and get feedbacks/contributions from them. In what language should I write the comments (which would be auto-documented by the go doc cmd) then?
You see, the goal of development efficiency and documentation for potential users who speak different languages contradict each other, when it comes to deciding a single language to be used in go doc comments.
Using English in go doc comments from the beginning of a project is an option, chosen by a number of developers who are not native English speakers. But it would break the internal usage of comments, especially when members of the dev team might not be fluent in English. “Restricting go doc comments to English text” as a coding style guide may lead to most developers leave the comment sections untouched, resulting in outdated or absent documentation.