Translation flow - Githubissues

MortenHofft commented 3 years ago

There is currently no process set up to edit labels (english or other languages).

~Secondly most of the texts that are there in the file is placeholder like "Fill in some content here" or even worse is missing and hence just show as filter.hostKey.description~ removing placeholder values has an issue of its own https://github.com/gbif/hosted-portals/issues/134

tucotuco commented 3 years ago

We are able to do so by forking this repository and making pull requests though, no?

MortenHofft commented 3 years ago

We could (not this repo though, but this monorepo https://github.com/gbif/gbif-web/tree/master/packages/react-components), but it probably isn't the best way to get people involved. We have used Crowdin for GBIF.org and other projects. It works well is my impression. And has a lot of nice features you wouldn't get with pull requests (such as an easy way to discuss and share context). And get machine suggestions. And get previews for messageformat with variables in them. And perhaps most importantly, no broken syntax and scary process for those less tech savy

MattBlissett commented 3 years ago

Thinking about how best to reuse enumeration value translations (Preserved Specimen etc), which are already translated into several languages for GBIF.org, and need to be consistent between these.

We could add the additional languages (Samoan etc) to the GBIF.org CrowdIn project, with instructions only to translate the "enums" folder
We could take the enumeration translation files at HP component build time from the GBIF portal project, i.e. https://github.com/gbif/portal16/tree/master/locales/translations/fr/enums

The rest of the HP UI (Search, Basis of Record) could be a second CrowdIn project — CrowdIn should make suggestions where the text is already translated for GBIF.org, but it will required the translator to review and accept those suggestions.

[We could add the hosted portal translation keys to the GBIF.org translation project in CrowdIn (it seems to be possible to connect two Git repositories), but I think this would be confusing to the existing GBIF.org translators.)

(For some of these enumerations, the best place to put translations would be the XML definitions of the vocabulary/thesaurus. That would make them reusable in the IPT and by anyone else.)

tucotuco commented 3 years ago

VertNet is looking for guidance about how best to help provide the original content in English that can go to Crowd-In.

MortenHofft commented 3 years ago

The flow we have for GBIF.org - I imagine we do something similar

Some web developer (e.g. me) write their best attempt (I refer to it as danglish)
- I'm open to someone else correcting these through pull requests or even just in issues starting today. This way you can influence the next step ;)
This is then translated into proper english by the GBIF secretariat comms team
- My impression is that GBIF communications would prefer to control this step
The corrected english is the source that translators see as the source for other languages.
- What the translated texts are is up to the community and are discussed and edited openly in Crowdin, but hopefully they reflect the source.

MortenHofft commented 3 years ago

I've updated this issue to be translation flow specific. As we already have another issue for removing/replacing placeholder texts. The two tasks are independent and fixing placeholder texts should really be done before any translations starts

tucotuco commented 3 years ago

I created a pull request with English text for as many terms as I could identify.

MortenHofft commented 3 years ago

Some thoughts on how this could/should be implemented. @thomasstjerne and @MattBlissett do you have any thoughts on how this should be done?

Browser perspective

I can see these options:

the lib user adds a script tag pointing to one or more translation file(s) (e.g. spanish). The lib user then refer to that when configuring the data components.
The lib user simply states some locales they want to support and the component knows where to fetch a translation file.

Both approaches could work, but the second allow us to load them as needed. This could matter for performance if the site has many translations. (say one translation is 50kb and then load 10 of them)

Common to both is that it loads a single file. Our translation library expects a single json with all strings so loading them as one seems the simplest and should perform better as well.

Where to translate this project

I too think it makes sense to have this project alongside the "gbif.org -ui" Crowdin project (and its english counter part). The intention is that the two will align over time, and secondly all the enumerations are already in that project.

Managing the translations

I find it easier to work with multiple files than one huge. And secondly the existing enum translations from the "gbif.org" project is individual files. So we need a to stitch them together. I imagine a build step doing that.

How much should go into the browser translation file

For gbif.org we have everything as part of the translation file. E.g. all languages and all GrSciColl collection disciplines (just to mention 2 enums that are rarely used).

Instead of including everything, we could also select those that we use most frequently and load the rest from a new endpoint. E.g. /translation/languages/abk would return Abkhazian. Similar to how we load dataset titles and scientific names.

Or we could explore the option to load all the values for an enum, but not do so until they enum is used. So the GrSciColl collection disciplines will not be loaded until you visist a GrSciColl institution page.

I'm honestly not sure how to do so technically or if it is a huge task, but I like the idea to keep the core translation file smaller by focusing on the UI elements and then load enums for data presentation asynchrounsly. I imagine that this is a fairly simple thing to do.

Performance

It is easy to get worried about band with and loading a bloated translation file too much.

It might be worth remembering that the "giant" translation file for gbif.org is 150kb unzipped and 50kb zipped. The header image alone on gbif.org is 200kb and we load 1.4 mb images for our home page alone. Individual map tiles are as big as 152kb. If we can avoid blocking the first render, then perhaps we shouldn't worry about a 10kb vs 50kb translation file.

MortenHofft commented 3 years ago

This has progressed and is now in the staging environment. The current state is:

The library ships with english text included. We could consider removing that, and have a loader while the relevant translations load.
Any other languages is loaded on demand if the locale is provided when mounting the widget.
- It does so by calling an uncached endpoint that provide the translation mapping - meaning a link to the translation file along with a hash of the content. E.g. fr: https://some.url/translations/fr.json?v=1230987243. That means that the small mapping file will be loaded on all requests, but the large translation file can be cached. And we can do so without breaking the cache for the library. The downside is that all users must fetch the translationMap when the page is loaded. The benefit is that the library and the translations can be cached individually (meaning the french users do not need to fetch everything again just because a german translation is updated).
Translations are managed as for GBIF.org - so there is one project in crowdin that is internal that translate developer english into proper english. And then there is a secondary project that use the proper english as source and translate that to all other languages.
- This means that the english version must be proofread by GBIF communications before it can be translated to other languages.
The hosted portals translations live in the same Crowdin project as GBIF.org. That means that duplicate strings can be autofilled. This is useful as many terms will be the same on the hosted portals and gbif.org. Most notably all enumerations.
The hosted portals will automatically see a translated version of the data widgets corresponding to the language code they have specified for the site. It is possible to overwrite the code in the sites _data/languages.yml.

We still need to figure out what the best flow is for translators and how we can support/guide the effort. I will update the issue with more information when that is in place.

@daiesco you've asked about this recently. You will see that your site now appears partly translated.

daiesco commented 3 years ago

Thank you @MortenHofft, we will be moving forward with the translations in the Crowdin project.

langeveldNMR commented 3 years ago

I recently translated most of the relevant terms into Dutch, but I have however not started translating all country names into Dutch. Surely there must be a faster and less typo-prone way to do that? E.g. use ISO codes to import the translations? For example: https://nl.wikipedia.org/wiki/ISO_3166-1

MortenHofft commented 3 years ago

@langeveldNMR yes. You can upload translations to Crowdin. If you find a source and format it, then there is a button to do so. https://crowdin.com/project/gbif-portal/nl You can see what format they should be uploaded in if you download the file first.

Screenshot 2021-10-12 at 11 26 09

It is difficult for me to evaluate the quality of the data sources in various languages so I leave that to the translator

MattBlissett commented 3 years ago

Please also note that Wikipedia uses short/informal names like "Bolivia" and "Noord-Korea" rather than "Bolivia, Plurinationale Staat" (or however that would be in Dutch) and "Korea, Democratische Volksrepubliek".

Many country names are in the Crowdin global dictionary, so you could also click the "Save" icon next to the suggestion and go through fairly quickly, one click for each country. (Still not perfect though; e.g. there are three different suggestions for Kyrgyzstan.)

langeveldNMR commented 3 years ago

Thanks for the information. I used the formal Dutch names kept at https://namen.taalunie.org/landen and matched those with the country codes provided in the json file. Easy and rather quick to do like this.

MortenHofft commented 3 years ago

We have decided on a process for translations and implemented it. I will close this issue. There will no doubt be issues related to translations in the future. Feel free to open a new issue.

langeveldNMR commented 2 years ago

I recently noted that the Dutch translations I prepared are already available through https://hp-nhm-rotterdam.gbif-staging.org/nl/data.html but are not yet deployed on the portal itself https://specimens.hetnatuurhistorisch.nl/nl/data.html (I did do a new release recently, but nothing changed). Should they first be approved on crowdin or is some other action necessary?

MortenHofft commented 2 years ago

I have to merge it into the web project and deploy it to master. So it isn't you, but me that should monitor translations more carefully. I will redeploy it now so it should be available in a 5 minutes or so

langeveldNMR commented 2 years ago

Just to make sure I am not missing anything: I did a new release recently, but nothing changed. On crowdin https://crowdin.com/project/gbif-portal/nl the translations are still blue (not green). Please let me know if there is any other action required from our side.

MortenHofft commented 2 years ago

I cannot see any recent changes in the Dutch translations, but I might have missed it. Could you give me an example translation that I can check? And could you specify what "nothing changed" mean: is it in staging but not in prod or does it not show anywhere?

langeveldNMR commented 2 years ago

I have not added any new translations recently, and they are still showing in staging but not in prod.

MortenHofft commented 2 years ago

Thanks @langeveldNMR - could you please provide an example. That will make it easier for me to find the cause.

btw there are 2 types of translations:

those that you do in crowdin - which relates to the data widgets (occurrence search). Your releases have no impact on those. For those to go into prod, then I have to merge the Crowdin changes and do a deployment to production. If I only deploy into staging, then it will only show in staging. That could be the reason here, except that I have done a production deployment recently and do not see any changes since - so that is why I'm confused.
And then there are the translations you handle in your Github repo, which is for everything else (menus and, blog posts, prose etc). That will only show in production of you do a release.

langeveldNMR commented 2 years ago

It concerns the first type you list. Attached are four screenshots.

In the Dutch (NL) staging environment the data widgets are translated. However, they are not in the Dutch production environment, where they are identical to the English production environment.

staging NL staging EN prod NL prod EN

MortenHofft commented 2 years ago

ahhhh - there are no translations at all. I understood your message as "a few of the latest translations are missing". I see that I haven't added NL as a supported language in prod. That is done now. You translations are visible in your prod environment. Sorry about that - I misunderstood the problem.

gbif / hosted-portals

Translation flow #60

Browser perspective

Where to translate this project

Managing the translations

How much should go into the browser translation file

Performance