gbif / hosted-portals

Support material for establishing the GBIF Hosted Portals
Apache License 2.0
10 stars 6 forks source link

Translation flow #60

Closed MortenHofft closed 3 years ago

MortenHofft commented 3 years ago

There is currently no process set up to edit labels (english or other languages).

~Secondly most of the texts that are there in the file is placeholder like "Fill in some content here" or even worse is missing and hence just show as filter.hostKey.description~ removing placeholder values has an issue of its own https://github.com/gbif/hosted-portals/issues/134

tucotuco commented 3 years ago

We are able to do so by forking this repository and making pull requests though, no?

MortenHofft commented 3 years ago

We could (not this repo though, but this monorepo https://github.com/gbif/gbif-web/tree/master/packages/react-components), but it probably isn't the best way to get people involved. We have used Crowdin for GBIF.org and other projects. It works well is my impression. And has a lot of nice features you wouldn't get with pull requests (such as an easy way to discuss and share context). And get machine suggestions. And get previews for messageformat with variables in them. And perhaps most importantly, no broken syntax and scary process for those less tech savy

MattBlissett commented 3 years ago

Thinking about how best to reuse enumeration value translations (Preserved Specimen etc), which are already translated into several languages for GBIF.org, and need to be consistent between these.

The rest of the HP UI (Search, Basis of Record) could be a second CrowdIn project — CrowdIn should make suggestions where the text is already translated for GBIF.org, but it will required the translator to review and accept those suggestions.

[We could add the hosted portal translation keys to the GBIF.org translation project in CrowdIn (it seems to be possible to connect two Git repositories), but I think this would be confusing to the existing GBIF.org translators.)

(For some of these enumerations, the best place to put translations would be the XML definitions of the vocabulary/thesaurus. That would make them reusable in the IPT and by anyone else.)

tucotuco commented 3 years ago

VertNet is looking for guidance about how best to help provide the original content in English that can go to Crowd-In.

MortenHofft commented 3 years ago

The flow we have for GBIF.org - I imagine we do something similar

MortenHofft commented 3 years ago

I've updated this issue to be translation flow specific. As we already have another issue for removing/replacing placeholder texts. The two tasks are independent and fixing placeholder texts should really be done before any translations starts

tucotuco commented 3 years ago

I created a pull request with English text for as many terms as I could identify.

MortenHofft commented 3 years ago

Some thoughts on how this could/should be implemented. @thomasstjerne and @MattBlissett do you have any thoughts on how this should be done?

Browser perspective

I can see these options:

Both approaches could work, but the second allow us to load them as needed. This could matter for performance if the site has many translations. (say one translation is 50kb and then load 10 of them)

Common to both is that it loads a single file. Our translation library expects a single json with all strings so loading them as one seems the simplest and should perform better as well.

Where to translate this project

I too think it makes sense to have this project alongside the "gbif.org -ui" Crowdin project (and its english counter part). The intention is that the two will align over time, and secondly all the enumerations are already in that project.

Managing the translations

I find it easier to work with multiple files than one huge. And secondly the existing enum translations from the "gbif.org" project is individual files. So we need a to stitch them together. I imagine a build step doing that.

How much should go into the browser translation file

For gbif.org we have everything as part of the translation file. E.g. all languages and all GrSciColl collection disciplines (just to mention 2 enums that are rarely used).

Instead of including everything, we could also select those that we use most frequently and load the rest from a new endpoint. E.g. /translation/languages/abk would return Abkhazian. Similar to how we load dataset titles and scientific names.

Or we could explore the option to load all the values for an enum, but not do so until they enum is used. So the GrSciColl collection disciplines will not be loaded until you visist a GrSciColl institution page.

I'm honestly not sure how to do so technically or if it is a huge task, but I like the idea to keep the core translation file smaller by focusing on the UI elements and then load enums for data presentation asynchrounsly. I imagine that this is a fairly simple thing to do.

Performance

It is easy to get worried about band with and loading a bloated translation file too much.

It might be worth remembering that the "giant" translation file for gbif.org is 150kb unzipped and 50kb zipped. The header image alone on gbif.org is 200kb and we load 1.4 mb images for our home page alone. Individual map tiles are as big as 152kb. If we can avoid blocking the first render, then perhaps we shouldn't worry about a 10kb vs 50kb translation file.

MortenHofft commented 3 years ago

This has progressed and is now in the staging environment. The current state is:

We still need to figure out what the best flow is for translators and how we can support/guide the effort. I will update the issue with more information when that is in place.

@daiesco you've asked about this recently. You will see that your site now appears partly translated.

daiesco commented 3 years ago

Thank you @MortenHofft, we will be moving forward with the translations in the Crowdin project.

langeveldNMR commented 3 years ago

I recently translated most of the relevant terms into Dutch, but I have however not started translating all country names into Dutch. Surely there must be a faster and less typo-prone way to do that? E.g. use ISO codes to import the translations? For example: https://nl.wikipedia.org/wiki/ISO_3166-1

MortenHofft commented 3 years ago

@langeveldNMR yes. You can upload translations to Crowdin. If you find a source and format it, then there is a button to do so. https://crowdin.com/project/gbif-portal/nl You can see what format they should be uploaded in if you download the file first.

Screenshot 2021-10-12 at 11 26 09

It is difficult for me to evaluate the quality of the data sources in various languages so I leave that to the translator

MattBlissett commented 3 years ago

Please also note that Wikipedia uses short/informal names like "Bolivia" and "Noord-Korea" rather than "Bolivia, Plurinationale Staat" (or however that would be in Dutch) and "Korea, Democratische Volksrepubliek".

Many country names are in the Crowdin global dictionary, so you could also click the "Save" icon next to the suggestion and go through fairly quickly, one click for each country. (Still not perfect though; e.g. there are three different suggestions for Kyrgyzstan.)

langeveldNMR commented 3 years ago

Thanks for the information. I used the formal Dutch names kept at https://namen.taalunie.org/landen and matched those with the country codes provided in the json file. Easy and rather quick to do like this.

MortenHofft commented 3 years ago

We have decided on a process for translations and implemented it. I will close this issue. There will no doubt be issues related to translations in the future. Feel free to open a new issue.

langeveldNMR commented 2 years ago

I recently noted that the Dutch translations I prepared are already available through https://hp-nhm-rotterdam.gbif-staging.org/nl/data.html but are not yet deployed on the portal itself https://specimens.hetnatuurhistorisch.nl/nl/data.html (I did do a new release recently, but nothing changed). Should they first be approved on crowdin or is some other action necessary?

MortenHofft commented 2 years ago

I have to merge it into the web project and deploy it to master. So it isn't you, but me that should monitor translations more carefully. I will redeploy it now so it should be available in a 5 minutes or so

langeveldNMR commented 2 years ago

Just to make sure I am not missing anything: I did a new release recently, but nothing changed. On crowdin https://crowdin.com/project/gbif-portal/nl the translations are still blue (not green). Please let me know if there is any other action required from our side.

MortenHofft commented 2 years ago

I cannot see any recent changes in the Dutch translations, but I might have missed it. Could you give me an example translation that I can check? And could you specify what "nothing changed" mean: is it in staging but not in prod or does it not show anywhere?

langeveldNMR commented 2 years ago

I have not added any new translations recently, and they are still showing in staging but not in prod.

MortenHofft commented 2 years ago

Thanks @langeveldNMR - could you please provide an example. That will make it easier for me to find the cause.

btw there are 2 types of translations:

langeveldNMR commented 2 years ago

It concerns the first type you list. Attached are four screenshots.

In the Dutch (NL) staging environment the data widgets are translated. However, they are not in the Dutch production environment, where they are identical to the English production environment.

staging NL staging EN prod NL prod EN

MortenHofft commented 2 years ago

ahhhh - there are no translations at all. I understood your message as "a few of the latest translations are missing". I see that I haven't added NL as a supported language in prod. That is done now. You translations are visible in your prod environment. Sorry about that - I misunderstood the problem.