Closed Magmaruss closed 5 months ago
First of all: thank you for contributing this feature.
Second: could you add a description that explains why this is needed, what is the use case (ideally with a simple example) and so on?
Description added.
First of all: apologies for ultra-slow follow up here. But I am ready to get this merged.
One minor legal thing first: if we haven't yet gotten a CLA, we'd need one from here (it's under Jackson project, that's fine):
https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf
and it's easiest to print, fill & sign, scan/photo, email to info
at fasterxml dot com.
(for project you can add Woodstox or FasterXML/Woodstox).
This is only needed once for all contributions to Woodstox, and once I get it, I can proceed with merge. I added some notes already; just need to re-read main logic and should be good to merge.
Thank you again for this contribution!
EDIT: I gave wrong email address -- CLA actually needs to be sent to info
at fasterxml dot com instead; cla email alias did not yet exist unlike I thought.
@Magmaruss quick note: I gave a wrong email address to send CLA to -- CLA actually needs to be sent to info
at fasterxml dot com instead; cla email alias did not yet exist unlike I thought.
So just in case you already sent one, could you please re-send it. Apologies for mix-up.
NOTE: CLA received; can now focus on finalizing PR.
@Magmaruss I hope you don't mind my applying changes I suggested here, wrt naming and adding @since
tags. I am open to different naming too if you strongly prefer yet different naming.
Beyond this, I think it'd be good to refactor decoding loop, possibly the way I suggested (but not necessarily, there may well be other ways to improve readability).
Also thank you for getting CLA; I am hoping we can get this PR merged soon!
@Magmaruss Ok, I did some more tweaks and I think things look good.
About the only thing I think that is needed would be a test or two for invalid cases; one for catching surrogate pairs without enabling their use; and another one for just invalid pairings with handling enabled. I hope to get this merged soon, and then soon after release 6.6.0.
Thank you again for contributing this!
I can write simple test for disabled option, but for now when disabled it's throwing "Illegal character entity..." - default behavior before introducing the feature. Should I replace this exception with something special when first valid high surrogate detected or leave the default behavior and add test only?
I can write simple test for disabled option, but for now when disabled it's throwing "Illegal character entity..." - default behavior before introducing the feature. Should I replace this exception with something special when first valid high surrogate detected or leave the default behavior and add test only?
Either way works for me: obviously improved error messages are always a plus, but if you want to limit the scope of changes & extra work, verifying default behavior is fine also.
Hi @Magmaruss! I was wondering if you might be able to add basic unit test? No rush if you don't have time -- I just hope to be able to merge this in, get 6.6.0 released, and then be able to do #134 (increase min JDK baseline to 8 for Woodstox).
Yes, sorry for that, but i am having a very difficult time. I will try to deliver it tonight or tomorrow at the latest
@Magmaruss Sorry did not mean to add pressure at all. I appreciate your help here and please take your time -- I don't want this to be another stress point.
Test case for disabled option added. Please tell if you need something more.
This feature is an answer for the issue #165. It's extending the XML reader functionality with a new configuration option (default disabled) to support reading XML with unicode characters written using surrogate pair (UTF-16 encoding). This feature is very important to correctly read XML from the legacy services which produces responses this way and there is no way to change the behavior of them.
Example: The external service produces a response with a unicode character written using UCS2:
Default behavior of XMLStreamReader is throwing an exception with message:
After setting the
P_RESOLVE_ENTITY_SURROGATE_PAIRS
parameter totrue
it will be readed with no exception converting the surrogate pair entities to the appropriate unicode (🎅)