falkreon / Jankson

JSON5 / HJSON parser and preprocessor which preserves ordering and comments
MIT License
51 stars 9 forks source link

String with russian characters is incorrectly parsed #42

Closed vaclavhodek closed 3 years ago

vaclavhodek commented 4 years ago

We use Jankson (latest 1.2.0 version) for reading JSON, extracting comments for further processing, and writting back JSON. However, the resulting JSON is malformed when it contains russian characters. See the example below.

Jackson JSON library with some extra settings (trailing commas, comments, etc.) can read and output the same string correctly. However, we prefer Jankson because of its support for comments.

Code:

Jankson.builder().build().load(input)
println(json.toJson(false, true))

Input:

PopupSoundMessage: {
  en: "Play with sound?",
  ru: "Играть с музыкой?"
}

Output:

"PopupSoundMessage": { 
  "en": "Play with sound?",
  "ru": "3@0BL A <C7K:>9?",
}
falkreon commented 4 years ago

Interesting! This definitely should not be happening.

Just so I can go straight to the problem: is "input" a String, a File, or an InputStream? I've been meaning to switch over to Reader so characters like this are handled by Java instead of manual surrogate assembly, I just want to make sure it's within the library before I start tearing up the floorboards.

vaclavhodek commented 3 years ago

I use this:

val json = Jankson.builder().build().load(input)

where input is a plain Java's String.

falkreon commented 3 years ago

Reproduced. I'll take care of it.

falkreon commented 3 years ago

The current behavior of a round-trip is for the serializer to escape out the cyrillic characters. This may not be ideal for your case and I'd definitely be open to a second issue to talk out where the right place would be to configure that in the API. Maybe in JsonGrammar?

vaclavhodek commented 3 years ago

Sorry for the delayed answer. I have to find a bit of time to look at it.

Yes, having this configurable in JsonGrammar seems like a viable solution. Maybe, it could be also viable to register (through JsonGrammar) custom writer.

vaclavhodek commented 3 years ago

I just checked the latest version and I can confirm that it works great with cyrillic. Thanks man!