global-healthy-liveable-cities / global_scorecards

The code in this repository draws on results from the Global Healthy and Sustainable Cities Indicator Collaboration Study to generate policy 'scorecard' reports.
MIT License
0 stars 0 forks source link

Matplotlib tick units should be localised according to country and language #11

Open carlhiggs opened 2 years ago

carlhiggs commented 2 years ago

Currently, tick marks for the threshold plots are handled by matplotlib's engineering ticker

# axis formatting
cax.xaxis.set_major_formatter(ticker.EngFormatter())

For numbers in the thousands this has the result of abbreviating units using a 'k' which is generally desirable, at least in English.

However, in Czech the meaning of 'k' is not natural/intuitive; for example, a better option would be "tisíce" (thousand). image

There is a python library for localising units which we have implemented elsewhere in the code for this project, Babel, which has a format_unit() function that could potentially be used for this. However, I haven't seen examples of its use in the context of matplotlib, or specifically the engineering ticker. It may be beyond scope to address this issue, but ideally, we would deal with internationalisation/localisation of these units as we have elsewhere in the project for translations.

carlhiggs commented 2 years ago

The code for the Matplotlib Engineering ticker is here: https://github.com/matplotlib/matplotlib/blob/v3.5.1/lib/matplotlib/ticker.py#L1311-L1459

It may be do-able to create a modified class with a 'locale' function to implement localised formatted units using Babel...

carlhiggs commented 2 years ago

Perhaps it could be done by mapping between the units used by the Eng ticker, and those present in the CLDR Unit Validity XML file which Babel uses (according to the format_unit() specification, linked above).

carlhiggs commented 2 years ago

Matplotlib Eng ticker seems to essentially be used to scale according to SI metric prefixes representing power to base 10 (from -24 to 24), while Babel is focused on a broader set of units but not necessarily with coverage of all of these powers. Pint, another python library for dealing with units has created a plain language look up of these for the xml-derived terms here) -- and notably there doesn't seem to be a way of dealing with, for example 'septillionth/quadrillionth', being 'y' or 10−24. At least, in a naive test, Babel doesn't recognise those words as being English (not suprising, they aren't in the xml spec):

>>> format_unit(12,'septillionth',locale=locale,length=length)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/env/lib/python3.9/site-packages/babel/units.py", line 119, in format_unit
    raise UnknownUnitError(unit=measurement_unit, locale=locale)
babel.units.UnknownUnitError: septillionth is not a known unit in en
>>> format_unit(12,'quadrillionth',locale=locale,length=length)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/env/lib/python3.9/site-packages/babel/units.py", line 119, in format_unit
    raise UnknownUnitError(unit=measurement_unit, locale=locale)
babel.units.UnknownUnitError: quadrillionth is not a known unit in en

Just entering 'y' doesn't work either -- Babel appears to use a kind of search to find the closest matching unit, and for 'y' it picks 'days'. I think this must be done because the formal term for km is 'length-kilometer' but it matches correctly if you say 'kilometer' (but not kilometre).

Anyway - the short story is

But come to think of it, I'm not sure that Babel has support for 'thousands'.... I think i missed that in considering the above