Open michaelhixson opened 9 years ago
Makes sense to me. The only question I have is whether the two should be bundled together; checks for non-numbers (inf, NaN) are quick to make.
If the quick check for longs is all there is to it, that's not an issue I guess. But would there be need to check for rare (?) cases of larger long
values that are, nonetheless, legal? If my memory of IEEE numbers serves me, as long as lowest bits are zeroes, scaling would actually allow accurate rendition of some of long numbers. But then again, perhaps quick check should suffice for the main purpose; if someone really wants more sophisticated checks, they could implement it themselves.
Now: beyond requirements, one open question would be where this would be implemented: in databind (by serializers), or at JsonGenerator
. This is a trade-off, with some pros and cons for either:
If done at databind, all long
/Long/
AtomicLong/
long[]serializers would need to do the check; and
jackson-module-afterburner` would need to honor same checks if and when overriding handling. This is not a huge thing to do, so it may be the way to go.
Conversely, if done at streaming JsonGenerator
, serializers would not need to care, and compatibility with Afterburner would work more reliably.
But the major downside now is that every single dataformat
module would need to apply the same checks, and this would lead to major copying of code (or coordination with helper methods).
Having written above, my first guess is that this should be handled at databind
level.
Re: performance of checking longs, I did some more digging and came up with a couple of implementations that are very fast. They're in the nanosecond range.
I actually have an open StackOverflow question about them: http://stackoverflow.com/questions/32411050/this-jmh-benchmark-is-inconsistent-across-machines-why
The two implementations:
public static boolean cast(long x) {
return x == (long) (double) x
&& x != Long.MAX_VALUE;
}
public static boolean zeros(long x) {
long a = Math.abs(x);
int z = Long.numberOfLeadingZeros(a);
return z > 10 || Long.numberOfTrailingZeros(a) > 10 - z;
}
Re: where this would go, I left a comment in Issue #912 that JsonGenerator
seemed more natural. I came to the same conclusion here, but you have a much better understanding than I do of the potential "blast radius" of this code.
NaN and ±Infinity should just be serialised as literal “null” instead. I’ve been searching for about half an hour for how to get Jackson to do that, i.e. behave as per the ECMAscript/JSON standard, already, with no success so far…
@mirabilos What should be output is really debatable; null
is one possibility but there are many others (throw exception; just output non-standard variant). Challenge with coercion to null is that this can hide legitimate problems.
On short term I would recommend you implement and register custom serializers for types you need (double
, float
, Double
, Float
) and add expected handling.
@michaelhixson Re-reading the issue again, I believe there should be multiple separate features, possibly:
NaN
numbers
null
(as per @mirabilos suggestion): if enabled AND "can write NaN" enabled, would write JSON null
instead of serialization as number (quoted or not).Also: I think these should be implemented as JsonGenerator.Feature
s, mostly to keep serializers simple. There will be some complications -- for example, coerced null
will not be filtered out with JsonInclude.Include.NON_NULL
, since Java value is not null, only serialization -- but overall I think that would be more reliable way to ensure that constraints are enforced.
Tatu Saloranta dixit:
@mirabilos What should be output is really debatable;
null
is one @possibility but there are many others (throw exception; just output @non-standard variant). Challenge with coercion to null is that this @can hide legitimate problems.
Yes, but the ECMAscript JSON standard requires it, and as such, it ought to be the default, or at the absolutely very least, an easily configurable setting.
On short term I would recommend you implement and register custom serializers for types you need (
double
,float
,Double
,Float
) and add expected handling.
That’s somewhat beyond me at the moment… Jackson is only used as part of jaxrs.
We ended up changing “double” to “Double” and doing something like:
dto.setFoo(Double.isFinite(foo) ? foo : null);
bye,
“It is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.” -- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2
@mirabilos That works too if you can control POJOs.
Anyway, I agree with the notion that there should be a way to avoid producing non-standard output. In other respects Jackson does this and requires enabling specific features to force non-standard output.
@cowtowncoder Multiple features sounds good to me.
Also: I think these should be implemented as
JsonGenerator.Feature
So for the NaN = exception one, that would involve changes in this file, right? https://github.com/FasterXML/jackson-core/blob/12cea0336d6a71c5b8f843a3a500c0d72bc5cf2a/src/main/java/com/fasterxml/jackson/core/json/UTF8JsonGenerator.java#L858-L866
In other words, I filed this issue in the wrong Jackson sub-project? It's really a jackson-core
issue?
Regarding the longs fitting in doubles... I'm not as sure as I used to be about the solution. I actually wrote a blog post about this exact problem and ways to address it. Having an option to encode all longs as strings might be a better solution (not all numbers, just longs specifically). In any case, I think I'd like to withdraw that feature request.
Any progress here? I'm quite interested in the "support the JSON specification" model of casting NaN into null.
ECMAScript specification for JSON.stringify states
NOTE4: Finite numbers are stringified as if by calling ToString(number). NaN and Infiinity regardless of sign are represented as the String null.
@drekbour The usual practice is that if there is work being done, notes are added to the issue. Lack of notes suggests (correctly) that nothing has been done. I am not aware of anyone working on this; but if it was to be added, it would need to go in Jackson 3.0 as it is API addition.
As to handling of NaN: I would not consider Javascript spec to be canonical JSON Specification in any sense of the word, but it sounds like useful background information on considering possible ways to handling thigs.
However, I would think that serialization of a number as String
value would be quite confusing and creating perhaps more problems than it solves, at least in Java context. This is different from Javascript (and many other scripting langs) where coercions often exist to allow more seamless change across types..
We came across this because the sender is serialising a double and a receiver deserialising the same as a BigDecimal. Is you point out, the string "NaN" instead of a primitive double is not helpful
One other note, regarding requirements: what ECMAscript
does is not definitive in any sense of the word: Javascript is simply one of many, many languages/platforms that use JSON, with no special stature amongst them, except for its wide usage. I do not, essentially, care deeply whether it has trouble with 64-bit long
a (as in: it does -- that's platform limitation, not JSON limitation).
That said, I am open for additional features to help sub-standard platforms, such as ECMAScript, that are unfortunately "number-challenged". But they will not and should not define general limits since majority of languages/platforms are able to deal with larger sets of floating-point AND integer numbers; some up to unlimited (Java's BigInteger
/ BigDecimal
, as just one example).
So: if some limits to handling of 64-bit long
s are proposed, a separate request for Feature
s to add should be filed.
One other implementation idea: instead of modifying jackson-databind
(or streaming API), perhaps better way would be to go with a full Module (jackson-lang-javascript
project?).
Module would then register custom (de)serializers for relevant types; and even if it might require smaller changes to databind (or not, hard to say in advance), it would be more flexible in its dealings. It would also not affect other usage but just apply to use cases where Javascript client / server compatibility is very important.
“what ECMAscript does is not definitive in any sense of the word”
This is not quite right. JSON is a proper subset of ECMAscript except that it allows two more chars in string literals (which all modern encoders just emit as Unicode escapes, for JS compatibility), but JSON is defined in the ECMAscript standard, in a normative annex, and this is the entire and exact JSON specification. All other implementations must honour what is written there; only for things not specified there, they may diverge.
No, JSON has no real dependency to Javascript. While ECMAScript folks try to define their understanding of JSON, that is no more definitive than the original RFC; and limitations of ECMAScript certainly have no relevancy to its use outside of that platform.
So while it is true that ECMAScript specs also define JSON, that does not somehow make JSON subset or subservient to it. To me it means more that they specify handling details in that context. And on Java platform the only concern would be wrt inter-operability.
I mean Oracle could of course similarly specify their understanding of JSON, but it would be foolish to expect others to start considering that somehow general definitive view of JSON and limitations. I would not be pushing that definition on issues for, say, node.js, except if there are some aspects of interoperability that could be improved.
On Sun, 8 Apr 2018, Tatu Saloranta wrote:
So while it is true that ECMAScript specs also define JSON, that does not somehow make JSON subset
Agreed, except for the part in which the ECMAscript spec defines JSON in terms of ECMAscript (some parts of it; others, such as the string definition, aren’t).
@mirabilos Ok yes, agreed. Its definition is not limited to ECMAScript in that sense, or include concepts only found there.
NaN and ±Infinity should just be serialised as literal “null” instead. I’ve been searching for about half an hour for how to get Jackson to do that, i.e. behave as per the ECMAscript/JSON standard, already, with no success so far…
ended up using Annotation JsonComponent to do the customization
Custom serializer is the way to go for different representations of NaNs of various kinds.
Just remember to register for both primitive double
type (Double.TYPE
) and wrapper Double
type (Double.class
).
About,
Not all long values can be represented as 64-bit floating point numbers (double), which is the only number type in JavaScript.
Say like this, it sounds this is a language specific issue and I could understand that this is maybe not the Job of a JSON library to add specific feature for every language.
But reading The JavaScript Object Notation (JSON) Data Interchange Format - RFC8259 § 6. Numbers
This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.
So maybe we can read it into "To maximize interoperability we advice to limit range and precision of numbers to IEEE 754 binary64 (double precision)"
Not directly linked, but note that also SenML-CBOR has this kind of advice :
For JSON Numbers, the CBOR representation can use integers, floating-point numbers, or decimal fractions (CBOR Tag 4); however, a representation SHOULD be chosen such that when the CBOR value is converted to an IEEE double-precision, floating-point value, it has exactly the same value as the original JSON Number converted to that form. ... .... (source : https://www.rfc-editor.org/rfc/rfc8428#section-6)
Just to say that maybe we could see that more like "a common way to maximize interoperability" rather than "a specific language limitation" ?
I don't think the main issue here is wordsmithing. What help is changing wording here?
What help is changing wording here?
I'm sorry if my comment was not helpful enough. I vaguely understand there was questions about where to put code about those features in jackson. Some jackson-lang-javascript
module idea was proposed. I don't know jackson code organization enough to give concrete help about that ... but I was just thinking that knowing if a feature is "pure JSON" or a "specific language" feature could help to know where the code should be put. And my point was the strict IEEE double-precision limitation for numbers is maybe more a pure JSON feature.
Ah ok @sbernard31 that makes sense. I agree. I think I misread your comment there, thank you for clarification.
This is a proposal for a new serialization feature,
SerializationFeature.STRICT_NUMBERS
.This feature would be disabled by default. With this feature enabled, the following actions would throw
JsonProcessingException
:+Infinity
,-Infinity
, orNaN
long
that cannot be represented as a numerically-equivalentdouble
+Infinity, -Infinity, NaN
Non-finite numbers cannot be represented in JSON. Jackson's current behavior is to serialize them as strings. This "works" if Jackson is on both ends (serializing and deserializing) but not necessarily when a different actor is deserializing. In particular, if client-side JavaScript is deserializing, the current behavior masks a class of bugs that might be difficult to detect -- putting strings where the client expects numbers.
Checking whether a number is finite is extremely low-cost (
Double.isFinite(x)
) so this part of the feature should not have a noticeable performance burden.Personally, I'm not intentionally serializing non-finite values anywhere and I'd prefer it if Jackson would throw an exception if I did so by mistake.
Longs that don't fit
Edit: I mistakenly called this a limitation of numbers in JSON, when it's actually about numbers in JavaScript.
Not all
long
values can be represented as 64-bit floating point numbers (double
), which is the only number type inJSONJavaScript. For example, casting these numbers todouble
in Java will result in loss of information:2^63 - 1
2^53 + 1
All
long
values whose absolute value is <=2^53
can be represented exactly asdouble
, so there is a fast path for validation of most commonlong
values. This will likely include all values that come from epoch millisecond timestamps or SQL auto-incrementing IDs. Randomly generatedlong
values, however, would likely require the slow path (if there is a slow path) and be invalid.All
short
andint
values are smaller than2^53
, so this feature should add no performance burden for them.Personally, I am accidentally serializing some
long
values that cannot be represented accurately as numbers inJSONJavaScript. Woops! I didn't notice until I implemented the strict numbers feature for myself.