StatCan / aaw-kubeflow-containers

Containers built to be used with Kubeflow for Data Science
Other
24 stars 21 forks source link

Add i18n to RStudio (initial elaboration and work) #149

Closed blairdrummond closed 3 years ago

blairdrummond commented 3 years ago

Look into Bilungualism options for RStudio

wg102 commented 3 years ago

The i18n of Rstudio can only be partial. It seems the menus/text are still in English, but the inside (the text, and the error messages) are able to be in French, if one changes the Environment variable to French

This is the code to add the French Locale and set it.

RUN echo "fr_CA.UTF-8 UTF-8" > /etc/locale.gen && \
    locale-gen
# Configure environment
ENV CONDA_DIR=/opt/conda \
    LC_ALL=fr_CA.UTF-8 \
    LANG=fr_CA.UTF-8 \
    LANGUAGE=fr_CA.UTF-8

The next step is to find how to detect the language, and in runtime set the environment variable. Current idea is to create a 'transparent layer' similar to remote-desktop dashboard but without UI that will check the browser language before it opens to the user.

wg102 commented 3 years ago

To have R-studio in ‘French’ it needs LANGto be setup with the correct locale.
The way it was decided is to “take the active language in the KF UI and automatically submit it as part of the "New Server" payload, and make the controller pass that locale as an env var (as you suggest) to all notebooks it launches. Then any container can find locale information at a known location and do with it as it pleases.” Which means to send it when creating a new notebook. The equivalent of testing with docker run -e LANG=fr_CA.UTF-8 imageTag, which overrides whatever value of that environment variable set in the docker file. The changes therefore need to apply to multiple places, and some things need to be verified Kubeflow-container to add the French locale (TODO: decide where to add locale: in the base file, or in r-studio file). Jupyter-api to add the language detection (with a controllable UI). Whatever the setting in those will inject:

R-Studio (image) only needs LANG. The R-studio in remote desktop needs both LANG and LANGUAGE. Other applications might be impacted when changing the locales. (to be investigated)

wg102 commented 3 years ago

From what I gathered, the way to have environment variables would be through the PodDefault (see https://www.kubeflow.org/docs/notebooks/setup/ step 12)

wg102 commented 3 years ago

The short answer for this issue, is to have the Environment variable LANG set to the wanted language. For this to work, the locale for that language needs to also be available (ex: fr_CA.UTF-8).

wg102 commented 3 years ago

This issue is split in two part,

Note: The locales are now added as part of the Dockerfile, see https://github.com/StatCan/kubeflow-containers/blob/d2b7863936af5e42ae2d4f342d1524887c1703db/docker-bits/0_Spark.Dockerfile#L8

Some other components in kubeflow-container may need to do similar things. i18n might be related to LANG, LANGUAGE and LC_ALL env variables.

ca-scribner commented 3 years ago

Started work on internationalizing the menus and commands. Command names/labels are defined by XML in src/org/rstudio/studio/client/workbench/commands/Commands.cmd.xml, which is then used by GWT in src/org/rstudio/core/Core.gwt.xml to generate Java classes at compile(?) time. Not sure where these generated classes go yet, but should be able to modify this code to make the text getters use internationalization.

ca-scribner commented 3 years ago

The commands defined in the Commands.cmd.xml file are used via deferred binding to create the Java classes that actually use the commands in menus (for say the menu dropdown lists). An example of part of one of these xml files is:

<commands>
    <cmd id="newPythonDoc"  // <- Not internationalizable (command's id name.  Never shown in UI, only used to access command)
        menuLabel="_Python File"  // <- internationalizable
        desc="Create a new Python file"
        rebindable="false"/> // <- Not internationalizable 
...
</commands>

The generation of code for these classes is called for in ./src/gwt/src/org/rstudio/core/Core.gwt.xml via:

<generate-with class="org.rstudio.core.rebind.command.CommandBundleGenerator" >
  <when-type-assignable class="org.rstudio.core.client.command.CommandBundle"/>
</generate-with>

This process invokes CommandBundleGenerator.generate(), which scans the xml to create java classes for everything defined.

To internationalize this, we must:

I have successfully modified the generators to use i18n with a hard-coded interface file, but I'm not sure yet how best to automatically generate the constants interface or the properties files. Big questions are how to invoke GWT's generation mechanism properly and where they're placed once they're generated.

(note: this describes commands, but I think other things are similarly generated using this file (shortcuts, others))

ca-scribner commented 3 years ago

As discussed here, JSON user prefs and state are built from a JSON schema file. This schema/resulting files is used in various locations in the UI (e.g. the Options dialog, the Command Palette) and needs to be translated as well.

The developer flow around changing these is to:

Maybe I should update this workflow to output files for multiple languages(?). We could build translatable files from the default language version, and maybe use a git diff or similar to identify which keys are modified and need changing (so we don't completely delete the translated files every time). This could also be done in the cmd.xml workflow.

Need to look into what content the .cpp/.hpp files contain. No idea how to internationalize those if I have to... But if just the java classes need it, that could be handled via resource bundle setting the name of the right xml file(?).

ca-scribner commented 3 years ago

Easiest way forward appears to be handling building of the interface/property files for any metadata file translation (eg: XML/JSON files) the same way as the project currently uses the JSON file to build the actual user properties. We will add scripts that translate the XML/JSON files to interface/property files, using the English text in the XML/JSON files to seed the default text in the interface and English version of the property files. The property files can then be translated as needed. Typical development flow would then be:

ca-scribner commented 3 years ago

Update of progress/general work summary documented here: https://github.com/ca-scribner/rstudio/pull/1#issuecomment-819840878

11000 new lines and counting in the PR haha. Although a lot of that is automatically generated through scripts

ca-scribner commented 3 years ago

Brief summary of progress:

Next steps:

Big outstanding items:

image

ca-scribner commented 3 years ago

Refactoring this into an epic tracked by Statcan/daaas/510. Closing this issue to claim the work already completed (fleshing out the task, doing some of the updates, etc). Future work will be tracked in separate issues