Ouranosinc / xclim

Library of derived climate variables, ie climate indicators, based on xarray.
https://xclim.readthedocs.io/en/stable/
Apache License 2.0
333 stars 59 forks source link

Internationalization of metadata #352

Closed aulemahal closed 4 years ago

aulemahal commented 4 years ago

A need for internal Ouranosinc projects, but potentially for external users, is to have available metadata in another language then english. As far as I know, there is not CF-standardized way of adding multilingual metadata in a netCDF, so we're our own for this. Also, while python has the GNU gettext tool built-in, it might not be what we want. But, I have found no examples of other python packages similar to xclim implementing such a feature. So on our own here to.

In this thread, I'd like to have people's input on the way to go forward. I have narrowed it down to 3 main options + ancillary questions :

Internationalization support in atmos, seaIce and land

1) None. This is out-of-scope for xclim 2) Gettext-based translation. We could ship pre-compiled gettext files with xclim and use python's gettext to generate the metadata. We could use tools like babel to help. 3) Json-based translation. As we have a limited number and explicit list of text fields to translate, we could json files to xclim, for each locale, mapping indice names and fields to translated strings. Would need are own tools to generate the fields.

How to (for 2 and 3):

A) Monkey-patching. If loading xclim and calling xclim.add_locale('fr'), all indices would be monkey-patched so that they contain the translated fields too. B) set_locale + context. The Indicator class would be modified so it adds the translations from the globally- or context-set locale option. Similar to xarray's set_options.

Metadata: I suggest adding "_fr" and similar to the localized attributes.

My favorite way is 3B. What do you think?

huard commented 4 years ago

Related: https://github.com/geopython/pywps/pull/510

Whatever we're choosing should allow a clean transfer of the xclim translations to pywps. At the moment, we scrape the Indicator metadata to fill the pywps metadata fields.

aulemahal commented 4 years ago

I see!

Right now I translate : description, long_name and comment, all three are written to the output's attributes. What other fields should be translated? Should we have a way to translate arguments name? Or documentation?

huard commented 4 years ago

I suggest we start with a small set of fields to translate and check out to pass those translations to finch. We'll expand later as needed. But yes, in @davidcaron pywps PR, input and output metadata is also translated.

davidcaron commented 4 years ago

In regards to the support for multiple languages in pywps, my intention was to provide only processes and inputs/outputs descriptions in another language. Not in the netcdf's metadata yet.

So... as David Huard mentioned, I would suggest starting with this, the indicators descriptions, and those of the inputs and outputs. For example, for the "threshold" argument, I would only provide a description in another language, but not translate the name "threshold" itself.

As for how to implement this, I think a json document would be a good choice... And provide a small interface in xclim to get these translations given a specific indicator function. Finch would use this api to setup the WPS processes in the correct language.