apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
63.09k stars 13.97k forks source link

[SIP-60] Proposal to extend i18n support to charts #13442

Closed thelightbird closed 1 year ago

thelightbird commented 3 years ago

[SIP] Proposal to extend i18n support to charts

Motivation

We would like to adopt Superset to replace our in-house BI tool. However, it turns out that a key feature is partly missing for us: internationalization. Indeed, even though it is already partially implemented, we would like to push Superset’s internationalization even further.

We see internationalization as follows in Superset:

  1. Superset UI Translation: text displayed in the frontend that is not entered by the user.
    ✅ Already implemented and it is possible to add languages, after activation of the feature in the settings file. Given that translations are packed into the frontend, a re-deployment is required for any change.

  2. Translation of the text that the user can fill in the charts (e.g. chart title, axis names, and metric names).
    ❌ Not implemented.

  3. Localization (l10n): Format the data according to the selected language: number, date, calendar, currency, ...
    Example: the days of the week and month on the time axes of the charts.
    ❌ Not implemented.

In this SIP, we aim to address part 2. about chart text translation.

We would like to be able to register multiple translations (one per language) for each text element that a chart may include. This way, when a user would switch the language in the frontend, the change would not be limited to Superset’s UI only. Instead, the newly selected language would be applied to the whole dashboard, then including for each chart: the chart title, the chart axis labels, and the metric labels.

Proposed Change

In this SIP, the internationalization implementation that we offer relies on the language selection feature, which is already available in Superset. The proposed change consists of replacing each text with a JSON containing the translations, as described in the section New or Changed Public Interfaces below. When there is one or no language defined, the behavior would remain the same as it is right now. But once that 2 or more languages would be defined, the multi-language behavior would be enabled.

Specifically, in the frontend, the change involves replacing all text input components with a new React component that would let the user input one translation per language via an edit button opening a modal (cf screenshot 3). When the modal is not opened, this component would only display the text matching the currently selected language.

Screenshot Before - Chart edition page

1 before

Screenshot After - Chart edition page

2 after 3 edit translation

New or Changed Public Interfaces

For each text field on the chart page, another i18n form field containing a JSON with all the translations would be added. This way, it would still be backward compatible if the user would decide later to deactivate the i18n feature from the backend configuration file.

First, we offer changes specifically for the translation of the title of a chart, named identically slice_name as a field of the Slice database model and as a form field validated by a schema.

The entire set of translations would be returned each time a GET request would be performed for a chart. To save the new i18n fields, the backend would have to be modified. Depending on the number of languages defined in the config file, the frontend would have to choose from which field the text would need to be displayed (example: il would be slice_name if less than 2 languages, else slice_name_i18n with the key of the currently selected language).

In the frontend, to avoid developing 2 distinct components (one to display, and another to edit translations), we can reuse the same component to display and optionally edit the translations. When less than 2 languages would be defined in the superset config, the component would act as a simple text field as it is right now. But when more than 1 language would be defined, it would display the text matching the currently selected language and have an edit button to change all translations.

Interface of the frontend React component:

TranslatableTextField.propTypes = {
  value: PropTypes.string, // Example: "Title"
  setValue: PropTypes.func,
  i18nValue: PropTypes.string, // Example: { "en": "Title", "fr": "Titre" }
  setI18nValue: PropTypes.func,
  enableEditMode: PropTypes.bool, // Show/hide edit button
}

Example of use:

<TranslatableTextField
  value={chart.slice_name}
  setValue={updateTitle()}
  i18nValue={chart.slice_name_i18n}
  setI18nValue={updateTitleTranslations()} />

The parameter i18nValue would be used only when more than one language would be defined. The parameter value would be used even if more than one language would be defined in case we want to disable multi-language support later.

To display the text, the react component would try to display the text associated with the currently selected language. If it is null or undefined, a generic message or a dash could be displayed instead.

Example of the changes on the GET chart response with the name of the chart if we would have 2 or more languages defined:

For the other text parts of a chart, like its axis labels or its metric name, which would need to be translated too, and which are currently all stored in the params field of the Slice object database model, it may require to store their matching translated text parts inside this params as well. We have thought about the required changes for those other text parts, but we would like to have the SIP initially reviewed on the idea and feasibility of incorporating the proposed changes for the translation of the chart title first because it is easier to reason about. We could offer more detailed changes about those other chart text fields in this SIP later.

Migration Plan and Compatibility

Required changes

Compatibility enabling and disabling multi-language

In the backend, when a GET request would be performed while more than one language would be defined, the backend would transform the i18n fields to fill each empty language with the untranslated key. Example: In database:

slice_name: "Title",
slice_name_i18n: null

Response of the GET:

slice_name: "Title",
slice_name_i18n: "{ "en": "Title", "es": "Title", "fr": "Title" }"

In the frontend, if more than one language would be defined in the backend configuration file, when creating a chart, the untranslated text field (for eg. slice_name) would be filled by the React component with the translation matching the selected language.

Rejected alternatives

Currently, the only way we have found to have charts in multiple languages is by duplicating charts and dashboards but this does not scale.

Related issue: i18n: translation of dashboards and charts #13030

kamalkeshavani-aiinside commented 3 years ago

Really like this proposal!!

One suggestion for frontend: There is a case when the user just wants to enable UI translations and don't want to write translations for every chart's title, axis, metrics, etc.

To display the text, the react component would try to display the text associated with the currently selected language. If it is null or undefined, a generic message or a dash could be displayed instead.

So there should be an option/control(either in UI or in config), which allows the user to display(re-use) the text associated with default(1st) language for other languages if not defined. Allowing user to control this behavior from UI(instead of config), means user can have different behavior for different charts allowing more control to user. This will also inspire user to try it out in a few charts, and then write more translations later if needed.

Papipo commented 3 years ago

I definitely need this! I don't have any feedback, a basic version should probably suffice.

darekw commented 3 years ago

+1

junlincc commented 3 years ago

@thelightbird @Papipo and @darekw

Thanks for the detailed proposal. The requested enhancements make sense and we would definitely like to make using Superset easier for global users.

Since the inception of the Superset, UI localization efforts have been mainly driven by and came from the community. The work of translating to different languages is also done relatively independently. Translation of the text that the user can fill in the charts is slightly different, since we will need to introduce a few new controls that impact all users in Explore. To keep the impact minimal to non-international users, the feature will be likely behind a Feature Flag. The complexity of adding translation to different chart components like, title, axis, metrics and filter are also different.

To push this forward, we will need your help to better define the minimum project scope, timeline, as well as contributing code collaboratively. We are in the process of redesigning the control panel, adding Drag and Drop, Dynamic Control and some other cool features. I'm not sure how different projects develop in parallel affect each other. But we want to make it work for all users! :)

thelightbird commented 3 years ago

@junlincc This is great news! We share your enthusiasm for the project.

First, according to the SIP Process, it seems that we forgot to post an email on the @dev Superset mailing list announcing this SIP, sorry for that. But actually, maybe we could wait a bit longer to first address the points you mentioned so that the initial message would be clearer from the start for the eventual mailing-list participants. What do you think?

Given that this i18n feature is super important to us, we would like to start working on it as soon as possible. We will probably have 2 developers from our end working on the implementation: someone else on the backend part and myself on the frontend part.

We are in the process of redesigning the control panel, adding Drag and Drop, Dynamic Control and some other cool features. I'm not sure how different projects develop in parallel affect each other.

Indeed, we were aware of the existence of some opened PRs and other SIPs which might have some overlapping areas with the expected implementation of this proposition. However, we are committed to making it work if the complexity of the different moving parts remains at a feasible and acceptable level for both parties.

To push this forward, we will need your help to better define the minimum project scope, timeline, as well as contributing code collaboratively.

About the timeline, we would like to start working on it as soon as we have an understanding of the set of other opened PRs that we would have to take into account.

About the project scope, we tried to define it in the opening message. How could we try to improve it?

Finally, we need to find overlapping work, specifically "the redesigning the control panel, adding Drag and Drop, Dynamic Control". We found some merged PRs on Drag and Drop (or dnd), but none that seems too significantly big to consider. Would you have some SIPs/PRs on this somewhere else? As for "dynamic control", which set of Superset page components (eg. dashboard, settings, charts, filters) would be affected? Are there opened Issues or PRs on that specifically?

Although we could not find ongoing PRs with common overlapping parts to our eventual implementation, here are some SIPs that we think we might need to keep an eye on

Would you like more information from us to answer those past questions?

We are looking forward to our fruitful collaboration.

junlincc commented 3 years ago

@thelightbird Hi Paul, thanks for providing your timeline. You can also find planned projects on our roadmap repo, and 2021-2022 proposed roadmap .

Couple items may be related to your project. [i18n] Clean up translation strings https://github.com/apache-superset/superset-roadmap/issues/153 @ktmud [chart] Dynamic controls UI reacts to data and chart types https://github.com/apache-superset/superset-roadmap/issues/88 @villebro We are in the process of migrating to Echarts, meaning all the legacy charts will be deprecated soon. I'm not sure how that's gonna affect this SIP

Given that you have a tight timeline, let's meet to discuss next step. please feel free to reach out to me in Superset Slack

mistercrunch commented 3 years ago

a new React component that would let the user input one translation per language

Users aren't typically the people translating applications. The way normal i18n works is that you flag strings for translations and different specialized translators use another workflow/application to do the actual translations. It's also generally applied to static strings in the codebase, not use-provided strings / metadata.

I really don't think it's common for applications to let users translate metadata (?). Here we're assuming a multi-language use case where you have teams that are collaborating in a multi-language kind of way on the same charts and dashboards. That seems very uncommon to me. Is the use case related to embedding?

Even if we think the use case is valid (I think we should debate this first), I don't think that making the title editable is the right approach as if we assume a group of specialized translators, you wouldn't want to make them all owners of the charts so that they can translate them...

To be clear, I'm very supportive of 1. and 3., but I don't think that 2. make a lot of sense outside of the embedded use case. Can we point to other [ideally simlar] applications that do this?

mistercrunch commented 3 years ago

Side question about 1. (which seems like a higher priority to me than 2.). A while back, when we moved visualization plugins and the superset-ui package out of the main repo, it made it such that those strings were not getting account for anymore when running the babel commands. Is this still the case? Do we have a way to include the strings in the superset-ui repo and in the core plugins when running babel currently?

junlincc commented 3 years ago

My understanding of this project

@mistercrunch so to some degree, this case is related to Embedded(not embedded Explore page)

From a product standpoint, it may seem like an edge case, but I have received similar request before, so the needs are validated. As long as it's developed behind a feature flag, I don't see much downside of the project, in terms of affecting major developments in Explore or Explore UI. In terms of priority, 1. definitely have a higher impact and needs more work. There are a few proposed roadmap items that can benefit this effort in general, as an OSS project Proposed Functionality Step 1: Clean up all concatenated translation strings Step 1.5 (Optional): Update i18n solution, make it possible to fallback to English translated strings. Step 2: Replace all translation keys with unique, contextualized keys. E.g. _('global.menu.Charts') listed in https://github.com/apache-superset/superset-roadmap/issues/153 @ktmud

@thelightbird Though 1 and 2 may target different use cases and personas, as an OSS project, we would appreciate some contribution towards 1. before or in parallel with 2. to improve localization in general in Superset.

ktmud commented 3 years ago

Note that for viz plugins, it is possible to add all translations within the plugin itself (example), which is then loaded only when the specific viz type is rendered. This should be the recommended way of adding UI-element translations for viz plugins since it makes it easier to migrate to dynamically-loaded plugins.

As to the request in this SIP, I can understand the needs but the proposed solution really sound worrisome because it is not just adding a new feature, but also replacing existing UI elements and changing db schema and API. Even if things are behind a feature flag, there will be substantial maintenance costs. I wonder if there are less intrusive ways to do more or less the same.

For example, Tableau recommends users to create multiple worksheets for different languages then use tabs and links to help users select languages : https://www.tableau.com/about/blog/2016/4/how-create-multilingual-dashboards-52933

Superset users can already do the same, albeit having different charts for different languages makes it more difficult to sync changes between languages.

However, currently you can already override Chart titles in the dashboard editor:

Xnip2021-04-28_18-53-00

which means, the only missing piece for users to re-use the same chart in dashboards of different languages is to add more dashboard-level overrides on the text fields in chart metadata. It could either be a new control on charts in dashboard edit mode, or inline interactions on chart elements, which may require very sophisticated work in the plugin system and I'm not even sure whether it is possible for all chart types.

villebro commented 3 years ago

This is a use case I have personally seen needed multiple times, and one that I could see being beneficial in environments where the same official truth needs to be made available to stakeholders using different languages. To give an example, all main EU regulation is required to be made available in 24 officially supported European languages (check the top right hand corner): https://europa.eu/european-union/law/legal-acts_en#directives . If the EU were ever to add proper visualizations to their reporting (one can always dream..), they would likely need a feature like this to ensure that only one copy of each dashboard/chart needs to be maintained.

One option would be to not change the metadata db schema and just Rison encode i18n strings where needed. For example, if we have a Metric called "Cars" and wanted to add a few optional translations to it. In this case we could replace the Metric name Cars with the string (default:en,en:Cars,fi:Autoja,se:Bilar), which in JSON is equivalent to

{
  "default": "en",
  "en": "Cars",
  "fi": "Autoja",
  "se": "Bilar"
}

When rendering the chart we would then just check if the metric name is a regular string or Rison parseable: 1) if it's a regular string, do nothing (current behavior) 2) if it parses, we'd check the selected language and pick the correct version, and in this case fall back to the English version if the chosen language is not supported.

To give proper native support for this introducing a new React component for the i18n input would be created (like in the original proposal), which would make it as uninvasive as possible to give an untranslated value at first, but then add additional languages where needed. This component could work something like this:

ktmud commented 3 years ago

If we do want to store things in chart metadata and add an additional control, then maybe regular JSON is good enough.

Another benefit of doing this via Dashboard overrides: you can assign translation access to those who only have access to the dashboards, without giving them write access to the charts. This would be a more scalable model where translations are often handed out to external translators with limited access to the system.

villebro commented 3 years ago

The reason I suggested Rison is because in this scenario the metric translation object would need to pass through the SQL queries as column aliases, in which case we would need to remove line changes from the object/string. So while we're at it, we might just as well compactify the object using Rison.

One downside of this approach of course is the need to add translations to all charts independently, probably introducing lots of duplication. So if this is an issue it may be better to centralize the translations in the dashboard metadata.

ktmud commented 3 years ago

If we want compactness, why not just use rison for all formData instead of just one field?

I think the most robust approach is to add a Superset app-level i18n table for user-inputted translatable strings that will have its own edit view (which then can be assigned to translators based on permissions). The charts or dashboard title can then reference the translatable string with some special syntax.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

aehanno commented 2 years ago

Hello, I wanted to know if this proposal is still valid I am currently looking for how to translate the labels(like title and metrics) for the dashboards if users choose different languages The different proposals are very interesting for my use case That's why I'm reopening the discussion, to know if this evol is still possible or if it has been abandoned Thanks in advance

icrc-fdeniger commented 2 years ago

Hello, we are currently looking for this kind of feature. A question: is it possible to add "macro" in charts/Dashboard Title or Column. We can use this approach to propose some translation.

rusackas commented 1 year ago

@thelightbird are you still interested in maintaining/contributing to this proposal? Since it was opened, a lot of strides have been made in the feasibility of the general intent, namely that as a monorepo, it should be easier to accomplish. My personal thought is that we can close this issue, and close out the SIP process since it doesn't seem particularly controversial. Instead, we should simply hope and evangelize for more PRs in this area, like we've seen a few of recently.

rusackas commented 1 year ago

Closing this since it's stale, largely non-controversial, and has not yet been brought up for a vote. If anyone wants to rekindle this effort, we can re-open it!

kathrelion commented 1 year ago

Hi, do you have some updates in this issue? I am currently looking for how to translate the labels in charts. Thanks in advance

bartvanvelden commented 5 months ago

I'm interested in having this feature implemented. As someone without Superset development experience, what would be the best way to proceed? Is there an option to sponsor this feature or support its development in some other way?

Thank you!

rusackas commented 5 months ago

I'm interested in having this feature implemented. As someone without Superset development experience, what would be the best way to proceed? Is there an option to sponsor this feature or support its development in some other way?

Thank you!

We'll be happy to review PRs on the subject, for sure, if you want to start contributing. I don't know how hard this'll be to pull off... the translation files/process are part of the main repo, not the packages within the monorepo. It all seems possible, but will just need to expand the files that pybabel looks at and extracts translatable strings from, mainly. Then the plugins will need to load the translated strings like the main codebase does - and that's the part that may or may not get weird.

I always advise opening a small PR first, just to prove that the flow might work for even one word. From there, one can iterate/build out more. You can also find interested parties in the #contributing channel on slack, and/or send me a DM... perhaps I can point you to a few other resources on Slack.