endlessm / kolibri-explore-plugin

The kolibri plugin to add the custom channel representation
MIT License
1 stars 4 forks source link

Change i18n-extract-messages build step to merge components #846

Open pwithnall opened 11 months ago

pwithnall commented 11 months ago

See https://github.com/endlessm/kolibri-explore-plugin/pull/840#discussion_r1333220955

The problem

Currently, the kolibri-tools i18n-extract-messages build utility is called multiple times by kolibri-explore-plugin, once for each component (template-ui, ek-components, the app). This was implemented in commit d4d678d2 because the utility only supports one --namespace/--searchPath argument pair.

This setup extracts the messages from each component into a CSV file named after the component, e.g. ek-components-messages.csv. This is then translated and committed as ek-components-messages.json.

At runtime, the webpack_asset tag (e.g. {% webpack_asset 'kolibri.core.default_frontend' %} in base.html) generates some inline JavaScript which calls kolibriCoreAppGlobal.registerLanguageAssets() on the translated messages, which are loaded from the JSON file and injected inline into the page in that call.

There is only a webpack asset for kolibri_explore_plugin.app, there is no webpack asset for ek-components or template-ui. This means that none of the runtime loading code actually loads the ek-components/template-ui messages in, and hence they are not translated at runtime.

There is prior art in Kolibri for how to handle translations from a sub-package, in the form of its kolibri-common package, which contains some common UI elements (and translations). It’s hooked in to the generated JSON file by kolibri-tools i18n-extract-messages: when passed an additional --namespace and --searchPath argument (for kolibri-common), it will extract messages from that namespace too, and bung them into the same output file.

This can be seen by doing ack -l searchPlaceholder in kolibri-explore-plugin vs ack -l resourcesUnavailable in Kolibri. The latter appears in kolibri/locale/*/LC_MESSAGES/kolibri.core.default_frontend-messages.json, whereas the former does not (and that means it’s not translated at runtime).

The fix

We need to revert d4d678d2 and instead change kolibri-tools i18n-extract-messages to support multiple --namespace/--searchPath pairs.

A short term workaround would be to manually move all the messages from {ek-components,template-ui}-messages.json into kolibri_explore_plugin.app-messages.json. But that would have to be repeated every time the strings are re-extracted or the translation is updated.

dbnicholson commented 11 months ago

I'm tempted to ditch the kolibri-tools translation flow completely and use formatjs directly everywhere like you did in loading-screen. In a sense this would be like how the backend translation process is just django's and not any custom think kolibri is doing. The benefits would be:

The obvious downside is that we'd have to write all of that tooling.

pwithnall commented 11 months ago

I don’t see what the upside is. We’d essentially have to write and maintain a copy of kolibri-tools.

manuq commented 11 months ago

A short term workaround would be to manually move all the messages from {ek-components,template-ui}-messages.json into kolibri_explore_plugin.app-messages.json. But that would have to be repeated every time the strings are re-extracted or the translation is updated.

I did the short workaround in https://github.com/endlessm/kolibri-explore-plugin/pull/848 . Here is the merge script if you want to put it into git. But is simple enough:

# Example: ./merge.py kolibri_explore_plugin.app-messages.json ek-components-messages.json template-ui-messages.json kolibri_explore_plugin.app-messages.json
import argparse
import os
import json

def _merge_translations(input_paths, output_path):

    all_json_data = {}
    for json_path in input_paths:
        with open(json_path) as json_file:
            json_data = json.load(json_file)
            all_json_data.update(json_data)

    with open(output_path, "w", encoding="utf8") as json_file:
        json.dump(
            all_json_data, json_file, ensure_ascii=False, sort_keys=True, indent=2
        )

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Merge JSON files."
    )
    parser.add_argument(
        "input_paths", help="Paths to JSON files to merge into output.", type=str, nargs="*"
    )
    parser.add_argument(
        "output_path", help="Path to JSON file to write.", type=str
    )

    args = vars(parser.parse_args())
    _merge_translations(**args)
pwithnall commented 11 months ago

Thanks @manuq!

I’m working on extending kolibri-tools to support multiple namespaces. We can go with the workaround for now. When kolibri-tools is ready and released, we’d revert https://github.com/endlessm/kolibri-explore-plugin/commit/d4d678d22c116e3d5b4f9190a189746562c1ee6b and use kolibri-tools i18n-extract-messages in a single pass.

pwithnall commented 11 months ago

I have a WIP branch here, but need to test it before proposing it upstream: https://github.com/endlessm/kolibri/tree/kolibri-tools-more-namespaces

pwithnall commented 11 months ago

There is prior art in Kolibri for how to handle translations from a sub-package, in the form of its kolibri-common package, which contains some common UI elements (and translations). It’s hooked in to the generated JSON file by kolibri-tools i18n-extract-messages: when passed an additional --namespace and --searchPath argument (for kolibri-common), it will extract messages from that namespace too, and bung them into the same output file.

On closer inspection, I was wrong about this. Even for kolibri-common, kolibri-tools i18n-extract-messages doesn’t bung strings from multiple namespaces into one file. I misread the output from ack: ack -l resourcesUnavailable | grep locale/en vs ack -l resourcesUnavailable | grep locale/es shows what’s going on more clearly.

It seems i18n-extract-messages does create a kolibri/locale/en/LC_MESSAGES/kolibri-common-messages.csv in Kolibri, but then their CSV-to-JSON step (via a web translation tool) squishes the kolibri-common messages into kolibri/locale/en/LC_MESSAGES/kolibri.core.default_frontend-messages.json.

So, my proposed changes to kolibri-tools won’t actually fix this issue (though they will speed up string extraction a bit).

Either we formalise the workaround, i.e. always have a step which squashes the translated strings from multiple namespaces into a single JSON file for webpack to load. Or we ask for a --single-output-file option in i18n-extract-messages which would do that for us. But that’s a bigger change to how i18n-extract-messages works than I’ve proposed so far, and I don’t know whether upstream Kolibri would be happy about it (since I guess they’d be unlikely to use it themselves).

In any case, here are the downstream changes needed to make use of https://github.com/learningequality/kolibri/pull/11357: https://github.com/endlessm/kolibri-explore-plugin/tree/846-i18n-extract-messages-more. I’ve not proposed them as a PR because the Kolibri changes need to be approved and land in a release first.

pwithnall commented 11 months ago

Unassigning myself as I don’t have any work time left on this (intermission week and STF work next), and it’s now partially blocked on Kolibri upstream. Mostly, though, it’s blocked on a decision about whether to stick with a manual CSV-to-JSON merging process, or put effort into tooling that. It now seems unlikely that kolibri-tools will be that tooling for us, unless upstream are open to adding a --single-output-file option (I haven’t asked).

rtibbles commented 11 months ago

There is prior art in Kolibri for how to handle translations from a sub-package, in the form of its kolibri-common package, which contains some common UI elements (and translations). It’s hooked in to the generated JSON file by kolibri-tools i18n-extract-messages: when passed an additional --namespace and --searchPath argument (for kolibri-common), it will extract messages from that namespace too, and bung them into the same output file.

One thing that I'd flag here - the generated JSON file for each webpack asset is not the same as the CSV files generated during extraction. There is no 1:1 correspondence here. The CSV generation is focused on creating CSV files for upload to Crowdin (although they could be uploaded elsewhere, I am sure). The JSON generation traverses import paths to include all referenced translations, so if something is bundled in e.g. the kolibri-common package, there is not a separate JSON file for anything in there, but rather it is bundled into JSON files where it is used, for that particular webpack asset.

So, the reason you see messages from kolibri-common landing in the default frontend message file is because some of those messages are used in the default frontend bundle - but that will not always be the case. Similarly, for your explore plugin the message JSON file building should be doing this for you. If it's not, please let me know what's going on because that sounds like a bug.

It now seems unlikely that kolibri-tools will be that tooling for us, unless upstream are open to adding a --single-output-file option (I haven’t asked).

I'm not sure what the purpose of this argument would be - the JSON files are deliberately selectively loaded into the frontend to avoid loading all translations for all possible plugins into every single page on Kolibri.

rtibbles commented 11 months ago

The obvious downside is that we'd have to write all of that tooling.

If you decide to go this route, I would strongly recommend using formatJS directly, and also using all of their tooling for string extraction, which they built in the last few years (this did not exist when we first did the vue-intl port of react-intl in 2016, hence we had to write all of the string extraction ourselves) that would save you a lot of hassle and bother, as I wouldn't recommend trying to recreate this yourselves.