Closed ctgraham closed 1 year ago
Tagging @Tyl13 and @chryslovelace as potentially interested parties from Pitt ULS.
This is work we'd very much be interested in supporting. We're not React experts by any stretch but we did manage to build this application, so we've got that going for us. Some help in scoping what needs to be internationalized and localized would be very welcome!
I have done some exploratory work and testing for this. From the research I did, it seems like lingui will be the simplest tool to be used for this. Especially while trying to do translations within an href. A big part of this is thanks to the macros. They are a quick and seemingly tidy way to wrap around the parts of the webpage that needs translated. An example of that can be seen here and here. This first example does not contain any macros within an html element's attribute like href. The second example does have that. To be able to do the translations, you need to extract and compile them. It's the easiest by adding the commands to the package.json and running yarn extract
and yarn compile
. Using the config file, the extracted files are messages.po located within their own locale folder. The compiled file becomes messages.js also located within the same folder. A quick visual example of the same codes:
This images are in order of the code linked above. There are a couple of things to note about the first. The <Trans ...> ... </Trans>
is a macro, additionally it will cause text to stick together if the html is not properly formatted. t({...})
is also a macro and is in the second example. This one allows you to place it within the href for example. The main difference is that the msgstr has to be explicitly placed using message: '<string that you want the message to be>'
.
Another thing to note and ponder is the best way to go about msgids. The last three <Trans>
elements have three different ways of going about msgids. The first is explicitly typing out what the id is, and it does not have to be limited to the string contained within the element. So for the zip code, the id could instead be 26003 even though the string is 10591. The next way of doing it is by using an opaque id, in this example: Footer.Fake.Test
. This can make it clear what is being affected by the translation. The last is to not explicitly put an id, which in turn will pull the string contained within the element. so for:
<Trans comment='Footer.Fake.Test.2 This is a test when msgid is not explicitly set'>
This is a test where the msgid is not explicitly set.
</Trans>
Which becomes this within the messages.po file for the en
locale.
For this code block, the site will look like this for the en
locale:
All three choices have inherent issues with them. I'll explain the ones that I can think of. Additionally, this image should be helpful in showcasing the issues.
As you can see between the two versions that is similar is that the spacing between the zip code and the second line is not there squishing the strings together. A problem inherent to the explicit string id is that it holds on to the original string, so if at some point the address changes as a whole then if you don't change both the id and the string, then the original address still exists. It could also be the case the original text at those points should change completely and it wouldn't make sense to put something like a tiny blurb about Rockefeller Archive Center under the msgid of 10591.
So a way around that would be using a more opaque id like Footer.Fake.Test
, which can more easily be moved around too. However, the translator may not know what the original string was without going into the other locale's messages.po file. This can slow down the actual translation of the site by adding more work to keep going between files when they can be contained within one file. The other issue with opaque ones is in the case that the section hasn't been translated yet. This can be seen from the second example where the webpage displays the opaque id instead of a message. This is seen within the messages.po file too.
The last option will be the quickest to type, and doesn't fall into the same issues the others have. When you change the string, the id automatically changes and thus old information is no longer retained that could possibly be displayed. Additionally, the translator will see the original locale's string and a coherent message can be gleamed from the id if it remains untranslated. However, every tiny change between it and the original creates a new id, and would need to be extracted and recompiled every time and the translations would be either copied to the new id or redone for even tiny changes like a missing period. Also, the old translations stay within the messages.po file but become commented out and can start to clutter it if not cleaned out. This is what a removed translation looks like after yarn extract
.
There could be additional issues or advantages to these ways of doing the msgid that I haven't thought of too.
Another thing that should be considered when it comes to msgids is for plurals. Here is a visual example of where plurals come into play:
The message.po file for this is here and the English version. For the french version, the digital match does not have a translation and thus will show what the msgid contains. The function where the translation happens is here. As you can see within the code, it is using a t
macro and a plural
macro. While you do not need the t
macro, the only way to set comments and explicitly give it a different msgid is by using the t
macro. The documentation for plurals is here.
I reached out to the PKP translation team and they strongly recommend against using opaque identifiers for the translation strings. PKP started their translations this way years ago, and now feel at odds with the bulk of the translation community, which generally expects the ids to be literal phrases in the original language.
That is, not:
msgid "admin.systemInfo.settingName"
msgstr "Nombre de configuración"
msgid "admin.systemInfo.settingValue"
msgstr "Valor de configuración"
but
msgid "Setting name"
msgstr "Nombre de configuración"
msgid "Setting value"
msgstr "Valor de configuración"
They also advise care with addresses, where the rendering of the ordering of the address components will vary by language/geographic convention. E.g. does the the building unit come before or after the street address; does the postal code come before or after the city? This could suggest that the address is either: a) best suited as a single translatable entity, b) a configuration value rather than a translatable string, or c) processed through a function (like plurals and ordinals).
For the Identifiers, this is also what I've come across when looking for best practices. With it though, I have a few questions.
A side comment about pluralization, for things that need exact matches. It would be better to use the JSX macro for plurals instead of the JS macro I used within the helper function. For example:
import { Plural } from "@lingui/macro"
<Plural
value={count}
offset={1}
// when value == 0
_0="Nobody arrived"
// when value == 1
_1="Only you arrived"
// when value == 2
// value - offset = 1 -> `one` plural form
one="You and # other guest arrived"
// when value >= 3
other="You and # other guests arrived"
/>
/*
This is transformed to Trans component with ID:
{count, plural, offset:1 _0 {Nobody arrived}
_1 {Only you arrived}
one {You and # other guest arrived}
other {You and # other guests arrived}}
*/
Is your feature request related to a problem? Please describe.
As others adopt this application, they will need a way to easily edit the text in the UI to reference their own institution's name, contact, policies, etc.
Describe the solution you'd like
I propose using i18n / l10n modules to address both the use cases of translation and customization. Abstracting the text to translation files allows the implementor a single area of concern for editing the UI.
Describe alternatives you've considered
Other solutions haven't really been considered; we don't have any React experience in house. This Issue is intended to generate the discussion to scope and strategize.
Additional context
This leaves outstanding the question of branding, which I see as a distinct challenge for implementors.