FasterXML / jackson-databind

General data-binding package for Jackson (2.x): works on streaming API (core) implementation(s)
Apache License 2.0
3.53k stars 1.38k forks source link

Support `@JsonFormat` for `String`, numbers, using String.format() #1114

Open cowtowncoder opened 8 years ago

cowtowncoder commented 8 years ago

Seems like it would be nice to just support formats available via format Strings that String.format() uses. Note that existing support for date/time values use different formatting; this should be fine, and if we want to for some reason support something from java.text.Format it should be possible to use heuristics to determine intended type of format. However that may not be necessary.

Initially we should just support simple types:

mjustin commented 5 years ago

I was about to open an issue suggesting adding support for DecimalFormat as the format for numbers, but I guess String.format gets the job done too. I assume I shouldn't open a separate issue for that?

cowtowncoder commented 5 years ago

@mjustin Probably not. The main question is, I think, two-fold:

cowtowncoder commented 5 years ago

Note: @JsonFormat(shape = Shape.STRING) already works for all JDK numbers, so this is just for extending formatting capabilities.

mincong-h commented 4 years ago

Hi @cowtowncoder , does this issue mean supporting the use-case below, where @JsonFormat is used to contextualize the serialization?

   static class IntAsThousandSeparatedString {
       @JsonFormat(shape=Shape.STRING, pattern="%,d")
       @JsonProperty("value")
       public int foo = 3_000;
   }

And it should be serialized as follow:

{ "value" : "3,000" }

I understand that it should support other numeric types and different locales as well. I would like to contribute on this one. If you could give me some hints about where to get started, that will be great.

cowtowncoder commented 4 years ago

@mincong-h Yes, this is correct.

As to supported types: it is ok to implement support for smaller set of types, for example just for String, or, just for various numbers.

On hints, here are some thoughts:

  1. Ideally support should be both for serializers and deserializers... but former is much easier. Not sure if latter can be supported generally, probably need to consider on case by case basis. May need to start by only support serialization
  2. Detection of custom Shape (and other JsonFormat overrides) is typically handled from createContextual() method of JsonSerializer (and JsonDeserializer) -- see f.ex EnumSerializer for best access (annotation is accessed through AnnotationIntrospector but there are other complications and that is why helper method findFormatOverrides() is useful
  3. createContextual() MUST create a new instance, can not modify state of (de)serializer since instances are shared and should be immutable

Handling of various JsonFormat settings gets pretty complicated unfortunately, so changes should be well tested, using various ways to change settings, including "config overrides" (ObjectMapper.configOverride(String.class).setFormat(....)).

mincong-h commented 4 years ago

Thanks for your hints, @cowtowncoder. I had a draft here to modify the ToStringSerializer https://github.com/FasterXML/jackson-databind/compare/master...mincong-h:iss1114 But more I think about this issue, more I feel this feature might not be that easy. Serialization is completely fine, using String.format provides rich choices for users to format their objects. However, deserialization is a nightmare—there is no way to parse strings easily. Otherwise Java's would have done that already.

At a high-level overview, I believe users expect Jackson annotations to work for both serialization and deserialization. Meaning that when declaring a @JsonFormat(shape=Shape.STRING, pattern="%,d"), they will expect Jackson to serialize and deserialize correctly. This is very hard to support and can also have impact on performance. I guess for the deserialization, we need to first parse the format string, e.g. "%,d", to understand it syntax, then use it to deserialize the value. This is hard because:

I think it's better to have the following alternatives:

  1. Introduce @JsonSerializeFormat
  2. Introduce @JsonNumberFormat

Annotation @JsonSerializeFormat means that we only support the serialization, this is a one direction configuration. It's not super cool, but should work for some particular cases. This could be nice as the complementary of @JsonFormat. And more importantly, it states explicitly that the annotation only works in serialization. It's self-documented.

Another alternative I see is @JsonNumberFormat (or @DecimalFormat as @mjustin mentioned). I think people are interested when working with locale-specific numbers. Because their thousands-separator, decimal point, etc are different. This annotation can be particularly useful when the serialized data comes from external system and has a fixed format. What I imagine is something like Python Pandas' DataFrame, where user can control explicitly the options themselves, e.g. thousands-separator, decimal point:

@JsonNumberFormat(thousands=" ", decimal=",")
public double frenchValue = 1_234.56;  // "1 234,56"

In this way, each parameter seems easier to parse compared to the "generic" approach of String.format(). Also supporting the deserialization could be easier than @JsonFormat. WDYT?

cowtowncoder commented 4 years ago

I don't like the idea of datatype-specific formatters, mostly because that leads to complexity on handling side as well as on number of special-purpose annotations. One really nice feature of @JsonFormat itself is that initial handling as well as overrides (per-type config overrides, per-property) is automated and (de)serializers need not worry about details of where things come from. Adding new format annotations and logic would be significantly more work, and take formatting into directions that may not be something Jackson should provide.

At the same time it is true that there are issues with simple textual format definition: not so much performance since we never process format String more than once (it is done when constructing (de)serializer and resulting format object used afterwards), but it may be confusing to try to figure supported formats.

As to serialization-only vs read-write: it is difficult to say what users expect, in a way, since existing format string is only used with date/time values... and yes, they can be read and written. But number formatting is different as fundamentally alternate number serialization would make resulting JSON content non-compliant (but not necessarily XML or CSV content).

So... I don't know what would be a good way forward. Here are some thoughts:

  1. I agree that number-formatting looks like different case from any other choices (date/time, "standard" String serialization formatting)
  2. "standard" (or maybe just "general purpose") formatting for String-values (including not just Java Strings but "stringified" Numbers and other types) could fall into "easier" category: and one where we could perhaps support deserialization too (depending on what kind of notation was chosen)
  3. For either of above it might be easier to implement advanced functionality as add-on module and NOT try to build it in jackson-databind. Existing support for @JsonFormat could be used (or not), depending on needs.

From this, I guess earlier comment by @mjustin -- about whether to split issue(s) -- makes sense again: yes, I think there are separate concerns of number-specific possibility, and then more general-purpose "String decoration" capability.

Ikki-Dai commented 2 years ago

Hope this feature can be implement, spring framework has a @NumberFormat annotation provide similar feature!

hurelhuyag commented 11 months ago

My suggestion: https://github.com/FasterXML/jackson-databind/pull/4273

pjfanning commented 11 months ago

@hurelhuyag #4273 doesn't appear to be anything like what is described here. Try writing a test with a JsonFormat annotation in it and showing how that the test can be made to work with the changes you are suggesting. Adding new constructors doesn't help when the code that calls the constructors is done deep within the jackson-databind code.

Any PR that doesn't attempt to prove that the code does something useful with a test case - this is not a good start.

hurelhuyag commented 11 months ago

@pjfanning Everyone here wants to have number formatting instead of just calling toString() method which ToStringSerializer does.

There are 2 ways to do it.

  1. String.format()
  2. NumberFormat.format()

I suggest using the second method directly in NumberSerializer.

It is not a time to write tests. It is time, we have to decide which method to use and where to put the feature. After that, tests can be written.

pjfanning commented 11 months ago

@pjfanning Everyone here wants to have number formatting instead of just calling toString() method which ToStringSerializer does.

There are 2 ways to do it.

  1. String.format()
  2. NumberFormat.format()

I suggest using the second method directly in NumberSerializer.

It is not a time to write tests. It is time, we have to decide which method to use and where to put the feature. After that, tests can be written.

We'll need deserializer support too. Serializer support on its own is not enough.

yihtserns commented 11 months ago

It is not a time to write tests. It is time, we have to decide which method to use and where to put the feature. After that, tests can be written.

@hurelhuyag do you come from a coding culture where devs only write tests to capture regression?

The tests you're being asked here is to demonstrate usage so people can easily see (from a user's perspective) what you have proposed/enabled/implemented/fixed.

cowtowncoder commented 11 months ago

I 100% think that it is exactly time to write tests to show how things work (and IF they work) :)

I added some notes on PR itself; but there are fairly big questions to resolve:

  1. Format notation to use (DecimalFormat vs StringFormat)
  2. Output: JSON Number (JsonGenerator.writeNumber(String)) vs JSON String? (not all Strings are valid JSON Numbers of course)
  3. Only for serialization? Probably has to be... but how would they be read back in?

On (2), @JsonFormat also has shape property which could be combined to choose between methods to call; so the question would be that of what to default to (I am guessing NUMBER since these are Number (or subtype) fields?

cowtowncoder commented 11 months ago

@yihtserns @pjfanning You may want to have a look at PR. I think we can make this work, but I have one design concern wrt JsonFormat.shape. Basically, there are 2 ways to write formatted numbers:

  1. JsonGenerator.writeString(String), to output JSON String
  2. JsonGenerator.writeNumber(String), to try to output JSON Number -- not all backends support it, and for others (JSON esp.) it could produce invalid content

I can see both use cases useful for some usage. To me, use of JsonFormat.shape makes sense; and Shape.STRING vs Shape.NUMBER is simple. But what about the default case (Shape.ANY)? One possibility would be to always require specifying shape, otherwise failing: that is, not guessing but requiring explicit choice.

Other would be to default to one style or other. My initial leaning was that since we are dealing with Numbers, default should be to writeNumber(String). But then again, alternative of "as text unless told otherwise" is safer and avoids producing invalid content.

WDYT?

pjfanning commented 11 months ago

Formatted numbers should be rendered as JSON strings. Spec compliant JSON parsers will not parse the formatted numbers if they are rendered as JSON numbers (without quotes).

If we really want to support rendering them as JSON numbers (and users are happy that the consumers have JSON parsers that will accept them), then maybe we could add an extra param on @JsonFormat - maybe named renderFormattedNumbersAsStrings, defaulting to true.

cowtowncoder commented 11 months ago

Thank you @pjfanning.

I think it depends: I think it'd be possible someone wanted to specify precision and produce compliant number output. In that case it'd seem wrong to quote value. But yes, it depends on intent which is difficult to deduce.

However, I realized that maybe DecimalFormat doesn't have that option, compared to String.format() string? If so, n/m, and we should default to String.

As to overriding number, no need for a new setting, just use shape like I explained:

@JsonFormat(format="...." shape=JsonFormat.Shape.NUMBER)

But I guess assuming shape = JsonFormat.Shape.ANY to be same as Shape.STRING makes sense.

One final possibility is that there is new NumberInput.looksLikeNumber(String) (or similar) that I added which we also could call to determine. But maybe that's too complicated heuristic and more confusing than helpful.

yihtserns commented 11 months ago

Overview of current Pattern & Shape support by various data types

This is just so we're on the same page regarding the current state (actually mainly for my personal reference, but I thought I should just share here). Be warned that this was made from a quick check so it is likely not exhaustive - I'm sure I missed some things.

  Pattern (Ser) Pattern (Deser) STRING ARRAY BINARY BOOLEAN NATURAL NUMBER NUMBER_FLOAT NUMBER_INT OBJECT SCALAR
String                        
Boolean                
Calendar,Date SimpleDateFormat SimpleDateFormat            
Enum          
Collection                      
Map                      
Map.Entry                      
Integer                      
Long                      
Byte                      
Short                      
Float                      
Double                      
BigDecimal,BigInteger,Number subclasses                  
AtomicXXX                        
Bean              
UUID                  
InetAddress                
JSR310 DateTime DateTimeFormatter DateTimeFormatter          
JSR310 Duration ChronoUnit ChronoUnit          
JooHyukKim commented 11 months ago

@yihtserns is there written test code used to validate the research above? Asking this so that we may have process of expanding support/non-support

yihtserns commented 11 months ago

@yihtserns is there written test code used to validate the research above? Asking this so that we may have process of expanding support/non-support

Nope: it was generated using the help of IDE+👀. 😅

yihtserns commented 11 months ago

Output: JSON Number (JsonGenerator.writeNumber(String)) vs JSON String? (not all Strings are valid JSON Numbers of course) I think it depends: I think it'd be possible someone wanted to specify precision and produce compliant number output. In that case it'd seem wrong to quote value. But yes, it depends on intent which is difficult to deduce.

I was under the impression that this is about @JsonFormat(shape=STRING), and also because there's no precedence for pattern being used to generate anything other than String output (because there's no such need before, of course).

Perhaps question about generating non-String formatted value should be separated into another issue/ticket.

Only for serialization? Probably has to be... but how would they be read back in?

So far everything that supports pattern, supports it for both serialization & deserialization. Although there may eventually be cases where deserialization can't be supported, I think there won't be any technical blocker for deserializing a number-string using the same pattern used to serialize it...

...waitaminute is there an API to parse a number-string using String.format pattern?..

Format notation to use (DecimalFormat vs StringFormat)

I'm ashamed to say that I'm not terribly familiar with DecimalFormat (I don't remember needing to use it ever for my day jobs), but from a quick scan of the javadoc, I have few thoughts:

cowtowncoder commented 10 months ago

@yihtserns Right, most things try to support serialization and deserialization. But there is no hard limit of this having to be the case. In case of Numbers I suspect supporting round-tripping may well be difficult. Which is why I would suggest that we defer deserialization support.

But if we are to support deserialization, I think that:

  1. For "serialize as actual Number token" (JsonFormat.Shape.NUMBER and subtypes) it would have to rely on default number decoding by format backend -- no attempt made for other handling, as parsers do not necessarily expose Strings to databind. Or more importantly, they will need to be able to decode number token without knowledge of @JsonFormat info (given that it's databind level, not available for streaming core)
  2. For "serialize as String token" we would/could/(probably) should support decoding. Eventually.

As to configuration of DecimalFormat, yeah, I don't think it's quite worth the effort to try to expose anything that does not with with existing JsonFormat facets. But anything that is mappable (i.e. there's JsonFormat property AND DecimalFormat can be configured to use it).

Still, I think there are many "write-only" use cases where either:

  1. Value is not read by Jackson (but something else)
  2. Output is actually valid number (truncating precision); or need not be read back as Number