How to Internationalize Your Flask App

At Metamova, we do a lot of localization engineering, and a lot of what I do at work on a regular basis is making sure that whatever work the translators did actually end up in any kind of human accessible, readable form. Unfortunately, sometimes our source files are so weird that I can’t really fix anything on the localization engineering stage. Especially when it comes for products that need continuous support, like websites and apps. Usually the reason we can’t help is because the stage before localization — internationalization — was done incorrectly or not at all.

Here’s a little project of mine where I internationalized my website as if it will be translated into a dozen languages.

I built my company’s website with Flask. Partly because I am loyal to Python, partly because I am so not learning PHP at this point in my life, but mostly because Flask is just plain fun to work with.

Anyway, for translation purposes, Flask uses a library called Babel. I have many problems with Babel and would like to rewrite many parts of it, but its good parts are too good. Babel has a large collection of dat files that contain specifications for dates, currency, fonts, etc. for 735 locales. That is very impressive and very useful.

Babel extracts your translatable content into PO files in which you can manage your translations, then tells Flask to display the strings according to what locale a user’s browser returns.

For Babel to know which content is translatable, you have to add a Babel config file to your project. The config file contains some predefined filters that tell Babel where to look for human text in your HTML template. As a rule, you need to go to your HTML templates and wrap all of your displayable text like this:

{{ _('Some text here') }}

The double curly braces tell Jinja web template engine that this is not HTML but some executable expression. The underscore and normal braces tell Babel that everything inside the quotes that follow is some text to be translated.

Now, here’s a crazy idea that will make the internationalization process infinitely easier for you: don’t put the actual human text in the HTML code.

Think about it. For a huge international website this makes no sense. If your product is the website (i.e. it’s a web-store, not just a portfolio webpage), and your product is international, some of the pages on your platform are bound to look different for different locales. You are also bound have different features and promos for different countries, AND if your product keeps growing and changing, the content will change so rapidly it makes no sense to keep updating the HTML code, or to keep rewriting it with new content. You actually risk creating such a mess you can’t even keep track of different versions of your source strings.

More than that, your source language may change. You may have different source languages, depending on the locale you are translating into, its culture and market. For instance, if you already translated English to German, maybe it would make more sense to have German be the source language for your Scandinavian expansion. Babel can’t have several source languages if the source language is already in the HTML template. It goes against the logic of Babel’s design. So let’s tweak Babel’s design a little. Here’s how.

Instead of natural language strings, it’s better to put some unique identifiers in the HTML, then get your corresponding strings by IDs, per target language. When designing and creating the HTML template, come up with a descriptive ID and put it in place of the actual string, then wrap it in Babel-friendly markup:

Before:

<li class="nav-item">
    <a class="nav-link active" href="#" data-scroll-nav="0">
        {{ _('Home') }}
    </a>
</li>

After:


<li class="nav-item">
    <a class="nav-link active" href="#" data-scroll-nav="0">
        {{ _('nav.item_1') }}
    </a>
</li>

Now, Babel functionality will treat the ID as the source, and export it into a PO file, where you can add the corresponding string as translation:

#: app/templates/metamova.html:78
msgid "nav.item_1"
msgstr "Home"

It’s here that I would like to mention that there are very many problems that I see with the PO format, and its design doesn’t make much sense to me. For instance, by default it treats natural language string as a unique identifier (worst idea ever), it’s kind of hard to add any extra information about the strings in a structured way, the versioning logic is unnecessarily complicated and has made me overwrite existing translations with empty strings several times before I got the grasp of how to work with these weird files. However, for all its faults, PO is the only format in the translation industry that, unlike XLIFF, offers some interoperability. Which is why I presume Babel uses it. So let’s go through what I think is the coolest way to leverage all the functionality of Babel while making your translation process as efficient as possible.

When you initialize a new locale in Babel, you get a new PO file. By our logic, our IDs are our source strings. So you’ll have PO files for each locale, all containing IDs, waiting to be connected to your translated content.

By using Babel’s built-in PO parser, we can now read all the PO files and merge them into one elegant JSON structure showing all translations for the same ID:

"nav.item_1": {
    "en": "Home",
    "uk": "Головна",
}

Here’s my function for merging the PO files into JSON:

def export_strings(source='en', target=None):
    source_str = StringIO(open(translations + '/' + source +      
        '/LC_MESSAGES/messages.po' , 'r', encoding='utf-8').read())
    source_catalog = read_po(source_str)
    for_tron = { message.id: {source: message.string}
                 for message in source_catalog if message.id }

    if not target:
        for locale in babel.list_translations():
            locale = locale.language
            if locale != source:
                target_str = StringIO(open(translations + '/' +
                locale + '/LC_MESSAGES/messages.po', 'r',
                encoding='utf-8').read())
                target_catalog = read_po(target_str)

                for message in target_catalog:
                    if message.id and message.id in for_tron.keys():
                        for_tron[message.id][locale]=message.string
    else:
        target_str = StringIO(open(translations + '/' + target +
          '/LC_MESSAGES/messages.po', 'r', encoding='utf-8').read())
        target_catalog = read_po(target_str)

        for message in target_catalog:
            if message.id and message.id in for_tron.keys():
                for_tron[message.id][target] = message.string

    with open(app_path + '/json_strings/strings.json', 'w',
     encoding='utf-8') as outfile:
        json.dump(for_tron, outfile, ensure_ascii=False)

Now, no matter who handles your localization engineering, all they need is this one JSON file. The engineer will extract whatever language you choose to be the source, then insert the translations into the corresponding locale. You can then use the updated JSON to import the strings into your PO files:


def import_strings(filename=None, source='en', target=None):
    if filename:
        from_tron = json.loads(open(filename, 'r', encoding='utf-8')
         .read())
    else:
        from_tron = json.loads(open(app_path +
        '/json_strings/strings.json', 'r', encoding='utf-8').read())

    template_str = StringIO(
     open('messages.pot', 'r', encoding='utf-8').read())

    if not target:
        for locale in babel.list_translations():
            locale = locale.language
            new_catalog = Catalog()
            for id in from_tron:
                if locale in from_tron[id].keys():
                    new_catalog.add(id, from_tron[id][locale])
            new_catalog.update(template)
            write_po(open(translations + '/' + locale +                  
             '/LC_MESSAGES/messages.po', 'wb'), new_catalog)

    else:
        new_catalog = Catalog()
        for id in from_tron:
            if target in from_tron[id].keys():
                new_catalog.add(id, from_tron[id][target])
        new_catalog.update(template)
        write_po(open(translations + '/' + target + 
          '/LC_MESSAGES/messages.po', 'wb'), new_catalog)

If you want to do your engineering in a more web 2.0-friendly way, with all of your strings exported into this JSON, you could build a server sending POST requests to any online translation service out there, in a controlled way. You can add a “date” attribute and save all the previous versions of strings that belonged to that particular ID, for any language. Et voila, your JSON file is now a translation memory as well!

"nav.item_1": {
    "en": "Home",
    "uk": [
        {string: "Головна", date: ""},
        {string: "Домашня сторінка", date: ""}
    ]
}

Pseudotranslation with Babel

But we are not done yet! Before you send your strings for translation, you have to check if you’ve extracted all text on your webpage. This is called pseudotranslation. We will just insert a bunch of random symbols where the translation should be, import it back to your webpage and look through it in a browser to see if there aren’t any rogue source strings anywhere that didn’t get exported for translation, like this:

To achieve this with Babel, we’ll need to tweak its source code a little bit. We will force Babel to display the strings we want by adding a new fake locale, called “pseudo”, to its list of all possible languages.

First of all, go to folder babel in your Python lib folder, then open the folder locale-data. For Babel to load a locale, it requires a corresponding DAT file to be placed in the locale-data folder. To have a DAT file for our fake pseudo language, we can copy the en.dat file (if you want to test right-to-left languages like Arabic, you could copy one of the files for Arabic locales). We will then name the copied file pseudo.dat.

cd <your env name>/lib/python<youpr python version>/site-packages/babel/locale-data/
cp en.dat pseudo.dat
cp en.dat ids.dat

Now, open the file core.py in the babel directory and find the list LOCALE_ALIASES, and add two new fake locales at the end of the list: ‘pseudo’: ‘pseudo’ and ‘ids’: ‘ids’. (I also added a fake IDs locale to be able to display our original IDs.)

nano <your env name>/lib/python<youpr python version>/site-packages/babel/core.py

Initialize pseudo and ids as new locales for translation in your configuration file or as an environment variable on your server.

LANGUAGES = os.environ.get('LANGUAGES') or [ 'en', 'uk', 'ids', 'pseudo'] Using a simple regex, insert a bunch of X-s instead of translation into the PO file and run the Babel command to compile the translation.

Flask pulls up the locale that your browser tells it to, based on your location or your preferred language you chose in the browser. Now, to force Flask to display a specific locale for a specific page, you can use the Babel function force_locale. Here’s how my Flask route for pseudotranslation looks:

@app.route('/pseudo')
def pseudo():
    with flask_babel.force_locale('pseudo'):
        return render_template('metamova.html')
Now I can just go to https://metamova.com/pseudo to view my website pseudotranslated at any time.

To display the original IDs, we don’t even need to initialize translation. If there’s no translation from Babel, Flask will just display the strings that are in the HTML. So you can do the same route but for IDs:

@app.route('/ids')
def ids():
    with flask_babel.force_locale('ids'):
        return render_template('metamova.html')
Now I can check what ID goes where at any time just by typing https://metamova.com/ids into my browser.

This is very exciting, because with a little bit of javascript magic we could create an interactive in-context translation view that would insert whatever the translator types into the right place based on the ID, so the translator could see the position of the string, as well as how short or long it should be right there on the page.

As someone who used to translate websites, this would have been a dream. For Ukrainian, one simple string like «Sign Up» could be translated as an infinitive («Зареєструватись»), call-to-action imperative mood («Зареєструйтеся»), a descriptive noun («Реєстрація»), or some completely different, much shorter phrase (like «Вхід»), all depending on where the phrase appears in the UI.

These are all very simple tweaks, but can be extremely helpful if done before you launch your website. If you plan to go aggressively international down the road, you’ll have no time to think about an optimal localization process when you are actually in the middle of managing a complicated web app for ten to twenty different countries and languages.

app-generator / docs

How to Internationalize Your Flask App #131

How to Internationalize Your Flask App