dividab / uom

Extensible unit of measure conversion with type safety for typescript
MIT License
18 stars 1 forks source link

Identification of Units #3

Open jonaskello opened 6 years ago

jonaskello commented 6 years ago

From offline discussion with @AdamLuotonen:

Regarding Unit and Units ....

Problems:

  1. PoundLbPerPoundLb and KilogramPerKilogram, etc. are reduced down to the same unit (One) because units are always reduced to their simplest form. This leads to trouble identifying which unit the user wants to display. The same value is displayed, but the unit label should be different (kg/kg vs lb/lb).

  2. Known units vs. unknown units. We define a large amount of known units (MeterPerSecond, Percent, LiterPerMinute, etc.). However, the library allows creation of new units by the functions of unit-times.ts and unit-divide.ts. Units are dynamic and can be created in the applications (but it is not used a lot in practice). When serializing these new units they cannot be serialized in the form 123.22:CentiMeter because that only works with known units.

  3. Performance. Today, we send around instances of Unit everywhere, mainly in Amount objects. Unit contains a lot of information and may be a deep hierarchy describing how the unit is derived. When searching for the known name for the unit (for example, serialization) we run the following code:

export function getStringFromUnit (unit: Unit.Unit <Quantity>): string {
  _ensureMetaAdded ();
  const name = _unitToString [getUnitKey (unit)];
  if (name === undefined) {
    throw new error ("unknown unit" + unit);
  }
  return name;
}

https://github.com/dividab/uom/blob/9c7a38cb7384e232cb9c42671ea267277ad17bf7/src/units.ts#L1382

function getUnitKey (unit: Unit.Unit <Quantity>): string {
  // Iterates whole array. ES6 does not have find
  const foundUnit = getAllUnits () .filter (u => Unit.equals (u, unit)) [0];
  if (! foundUnit) {
    throw new error ("Unknown Unit" + JSON.stringify (unit));
  }

  return JSON.stringify (foundUnit);
}

https://github.com/dividab/uom/blob/9c7a38cb7384e232cb9c42671ea267277ad17bf7/src/units.ts#L1432

getUnitKey is expensive (and therefore getStringFromUnit is also expensive) because we have to walk through all units and compare the structure. Even if we did not compare the structure it would be expensive, there is still a loop over a growing list of units. Running JSON.stringify is also not free (imagine thousands of calls to this).

Well. My suggestion is that instead of passing around the representation of the units in the application, I suggest that we only pass around the known unit names. When you want to do something with the unit, the library internally merges the unit structure into a map, and performs the same code as before. Serializing and deserializing becomes straightforward, a short string of unit name.

A potential problem with this may be that some applications may be dependent on Unit's internal structure. For example, Unit.quantity is available today, but it could be replaced by a single function call getQuantityForUnit(unit). Another potential issue may be if someone serializes and saves Unit with JSON.stringify, and then expects to be able to deserialize it and use it with the new code. There are some tests that suggest that:

A bonus with this new approach is that you can easily see which unit a volume has when debugging. Instead of a complicated JSON structure, one gets a name.

Basically, the problem is that we want to identify units by name, but at the same time have a dynamic unit system that can build new units on the fly. I think it's enough that an application could record some own units at boot, but do not use dynamic units in the calculations?

jonaskello commented 6 years ago

Offline comment from @geon:

pass around the known unit names

Would it not be better to compare references to known objects?

jonaskello commented 6 years ago

Offline comment from @AdamLuotonen:

That is what we are already doing, but if you check the unit tests we are supposed to support comparison of different references too. It would of course be most effective to just compare references, but that is not how the current thinking seems to be..

jonaskello commented 6 years ago

Offline comment from @AdamLuotonen:

If one should identity the unit in another way than a reference it has to have some identifier (for example a unit name) or you will have to keep track of where the information about it came from. Then you can separate PoundLbPerPoundLb from KilogramPerKilogram. I think my vote is for the appraoch with identifier since we just use one set of known units everywhere anyway. And when you have a unique identifier in the unit you can put the derivation of the units in a registry in the library.

jonaskello commented 6 years ago

My thinking is that it would be OK for an application to have a set of known units which it will have to keep within (which is what this proposal with identifier requires). In that case it should be up to each application to register the set of unit it wants to support. We could provide some units in the library, but the application could also add it's own to get a complete set. Perferable it would be possible to publish extra units as separate packages that are easily integrated with the uom core package.

jonaskello commented 6 years ago

Here is a screenshot from an actual application to illustrate the problem:

image

jonaskello commented 6 years ago

My suggestion is that instead of passing around the representation of the units in the application, I suggest that we only pass around the known unit names. When you want to do something with the unit, the library internally merges the unit structure into a map, and performs the same code as before. Serializing and deserializing becomes straightforward, a short string of unit name.

One solution that does not break compatibility could be to add an optional name to the unit while keeping the structural representation of the unit as it is today. That would probably clean up some of the code related to serialization. However this solution introduces more complexity as the unit now has another field and in some cases that field will not be set (when a new unit is the result of a calculation).

Jontem commented 5 years ago

@jonaskello @AdamLuotonen so can we please implement the suggested solution? I could give it a try but I don't feel like I have the full understanding of how it works today and how to implement it. We really need it to be resolved in a month or so

Jontem commented 5 years ago

I believe this is fixed now. Can we close this issue?

jonaskello commented 5 years ago

While the problem of identification is "solved", I'm still unsure if the solution is a good one. Having a name for each unit makes two equal units unequal just because they have different names. This is useful for some cases such as picking from a list where you need to identify each unit even if they are equal. But makes other cases impossible, such as deriving a new unit and finding out if it was the same as an existing unit.