elixir-cldr / cldr_units

Unit formatting (volume, area, length, ...) functions for the Common Locale Data Repository (CLDR)
Other
16 stars 13 forks source link

Unit reductions #4

Closed kipcole9 closed 5 years ago

kipcole9 commented 6 years ago

For a given unit, for example "5,280 feet" or "1,000 metres" it is often appropriate and expected to reduce the unit to a more commonly used version such as "1 mile" or "1 kilometre". This note explores an approach to implementation of unit reductions using Cldr data.

Measurement systems

In our example it would be expected that "feet" would be reduced using the "imperial" or "US" systems whereas "metres" would be reduced using the "metric" system. The unit data for Cldr does not maintain a per-unit mapping of unit name to measurement system.

For some units, such as digital units, have only a single system.

Locale and measurement system

Cldr does provide data a mapping of locale to measurement system so we can identify the preferred measurement system for a given locale. This would allow a unit reduction to additionally convert to the appropriate measurement system as well.

Automatic reduction

The intent of reduction is to produce a result in the range -10 < unit < +10. Therefore identify the reduction factor required and the target unit. Convert to the target unit. To identify the target unit we take into account the source unit's measurement system and attempt to find a reduction target in the same measurement system.

Example

iex> Cldr.Unit.reduce Cldr.Unit.new(:meter, 1100)
#Unit(:kilometer, 1.1)

# System conversion tries to keep a similar magnitude as the 
# source unit.  Convert to :US, :UK or :metric
iex> Cldr.Unit.convert_system Cldr.Unit(:meter, 1000), to: :US
#Unit(:yard, 1093.61)

# Round a unit. Rounding options are passed
# through to Cldr.Number
iex> Cldr.Unit.round Cldr.Unit(:yard, 1093.61)
#Unit(:yard, 1093.6)

Limitations

LostKobrakai commented 6 years ago

I really like the idea for converting to a different measurement system. This would certainly be a nice feature for internationalization. A mapping of system -> unit shouldn't be a difficult thing to add I'd imagine.

The automatic reduction I feel tries to be to automatic: E.g. I might want to reduce 300mm, but at least here in Germany it's uncommon to use decimeter so "30 cm" is actually better than "3 dm". A similar case would be with meter -> km. There's deca- and hectometer in between, but besides being unlikely in day-to-day use, they even seem to be absent in this package which means "300 meters" would be a unreachable result (300m isn't between [-10, 10], while 0.3km is).

Another example might be that a usage context is expecting sizes to be in meters or larger. So the 300mm example from above would better display as "0.3 meters", which is easier comparable to other lengths displayed in meters.

So I feel like a lot of use-cases would actually benefit from predefining the units used for reduction. I'd give the option to the user to either supply a single list of units or a list of units for each measurement system, where cldr would choose the measurement system by the locale. For differenciating between e.g. "300 meters" and "0.3 km" there could be an option like the format ones of cldr or one which specifically prefers/refuse values smaller than 1.

Also I'm not sure how well reduce would work in terms of api naming. For me it sounds way to much like what Enum does. I think the usecase is most often so show a "simpler", quicker readable version of a measurement so maybe something like simplify or similar.

About the limitations: The first one is a limitiation you've to deal with anyways if you're using the package. And having base 2 for filesizes would indeed be a useful addition.

kipcole9 commented 6 years ago

Thanks for the very helpful comments which, as usual, make a lot of sense.

You say:

So I feel like a lot of use-cases would actually benefit from predefining the units used for reduction. I'd give the option to the user to either supply a single list of units or a list of units for each measurement system,

Can you give me an example of what the api might look like (agree, reduce might not be such a good idea. Decimal.reduce uses it, but its context is less ambiguous.

In release 1.0 there is Cldr.Unit.convert/2 which allows conversion so you can already:

iex> Cldr.Unit.convert Cldr.Unit.new(:millimeter, 300), :centimeter
#Unit<:centimeter, 30.0>

But you suggest a list of alternatives which I understand conceptually but I'm not sure in practise how you're suggesting selecting amongst alternatives.

LostKobrakai commented 6 years ago
iex> units = [:centimeter, :meter]
iex> Cldr.Unit.convert_units Cldr.Unit.new(:millimeter, 3), units
#Unit<:centimeter, 0.3>
iex> Cldr.Unit.xyz Cldr.Unit.new(:millimeter, 3000), units
#Unit<:meter, 3>

iex> units = [:kilobyte, :megabyte, :gigabyte]
iex> Cldr.Unit.convert_units Cldr.Unit.new(:byte, 900), units
#Unit<:kilobyte, 0.9>
iex> Cldr.Unit.convert_units Cldr.Unit.new(:byte, 9_000_000_000_000), units
#Unit<:gigabyte, 9_000>

iex> metric = [:centimeter, :meter]
iex> uk = [:foot, :yard]
iex> us = [:foot, :yard]
iex> Cldr.Unit.convert_units_by_system Cldr.Unit.new(:millimeter, 3), metric: metric, uk: uk, us: us
#Unit<:foot, …>

With such a base api cldr units could also add "often used" lists of units. E.g. :byte..terabyte will probably be the most used digital unit range. Also the smallest unit in the list would always be chosen (even for 0.… values), while later units would be value < 1, but I'm really not sure what would be the best way to threshold the switching to the next bigger unit. That would probably need some more real use-cases to really be determined.

kipcole9 commented 5 years ago

I have pushed a new release and published version 2.2.0 which I believe addresses this issue. I'd welcome your feedback. Its also clear that the strategy of embedding conversion factors in the source isn't a good idea. In the next release I will convert the factors to json which can be downloaded so that updates to the conversion factor tables does not require a new release.

Enhancements

This release is primarily about improving the conversion of units without introducing precision errors that accumulate for floats. The strategy is to define the conversion value between individual unit pairs.

Currently the implementation uses a static map. In order to give users a better experience a future release will allow for both specifying mappings as a parameter to Cldr.Unit.convert/2 and as compile time configuration options including the option to download conversion tables from the internet.