Wikidata / Wikidata-Toolkit

Java library to interact with Wikibase
https://www.mediawiki.org/wiki/Wikidata_Toolkit
Apache License 2.0
372 stars 100 forks source link

Invalid serialization of quantity values with exponents #341

Open wetneb opened 6 years ago

wetneb commented 6 years ago

Way to reproduce the issue:

BigDecimal amount = new BigDecimal("12.34E+5");
QuantityValue value = Datamodel.makeQuantityValue(amount, "1");
ObjectMapper mapper = new ObjectMapper();
System.out.println(mapper.writeValueAsString(value));

You get {"value":{"amount":"+1.234E+6","unit":"1"},"type":"quantity"}, which is refused by the Wikibase API. One format that works is {"value":{"amount":"+1234000","unit":"1"},"type":"quantity"}, but I am wondering if this is distorting the value input by the user (as we are losing the precision information that is implicitly encoded in the engineering notation). Is there a sensible way to derive lower and upper bounds and set them silently for the user?

In the meantime client code can work around the issue by converting their BigDecimal objects like this: new BigDecimal(myBigDecimal.toPlainString()).

Tpt commented 6 years ago

Is there a sensible way to derive lower and upper bounds and set them silently for the user?

I believe that "12.34E+5" may be encoded as "+1234000±1000" (currently Wikibase user interface encodes it as "+1234000" without precision but it's not very nice). What do you think about it?

In the meantime client code can work around the issue by converting their BigDecimal objects like this: new BigDecimal(myBigDecimal.toPlainString()).

myBigDecimal.unscaledValue() does the same thing and is even more convenient.

Before thinking about how to properly handle these values if they are given to the data model constructors we could just check if bigDecimal.scale() == 0 and throw a meaningful error if if it not the case.

wetneb commented 6 years ago

I would say that it's rather "+1234000±500", no?

I'm confused - by rejecting everything that does not have scale 0, we would reject things like 12.345, right?

Tpt commented 6 years ago

I would say that it's rather "+1234000±500", no?

Yes, my bad. Engineering notation is indeed doing nearest-integer rounding.

I'm confused - by rejecting everything that does not have scale 0, we would reject things like 12.345, right?

Yes, indeed. If we want to add a filter, we need to find an other way. Maybe bigDec.toString().equals(bigDec.toPlainString())? But it's not very nice/efficient. An other way to fix this issue would be to do not reject any value and call toPlainString everywhere.

wetneb commented 6 years ago

I've made this client side: https://github.com/OpenRefine/OpenRefine/commit/39a3fedfa6abc2d84b923afe20ea292e5219c7d5 it seems to work okay: https://www.wikidata.org/wiki/Q697512#P3872.

I think that could potentially go in WDTK, but we would need to find a clean way to detect in a BigDecimal uses engineering notation (in my case I have access to the source string).