FasterXML / jackson-dataformat-xml

Extension for Jackson JSON processor that adds support for serializing POJOs as XML (and deserializing from XML) as an alternative to JSON
Apache License 2.0
561 stars 221 forks source link

Fix possible IndexOutOfBoundsException for issue #618 #619

Closed arthurscchan closed 7 months ago

arthurscchan commented 7 months ago

This PR provides a possible fix for issue #618 by adding a checking criterion to the while loop to ensure no out-of-bounds read even if the input is invalid.

pjfanning commented 7 months ago

Would it be possible to add a test case based on the fuzz issue - or similar? I don't have permissions to see the oss fuzz issues on this repo.

pjfanning commented 7 months ago

@arthurscchan this code does not compile - see

Error:  /home/runner/work/jackson-dataformat-xml/jackson-dataformat-xml/src/main/java/com/fasterxml/jackson/dataformat/xml/deser/XmlTokenStream.java:[589,5] missing return statement
Error:  Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile (default-compile) on project jackson-dataformat-xml: Compilation failure
arthurscchan commented 7 months ago

I have also fixed additional location where unexpected IndexOutOfBoundsException are thrown.

cowtowncoder commented 7 months ago

Earlier fix looked better to me; I am -1 on catching IndexOutOfBoundsExceptions: nothing within implementation should be throwing those.

One way forward would be to add tests to show the problem first (can put them under src/test/java/.../failing to prevent CI from failing). And then apply necessary fix. This could be different PR for tests only, and then fix that solves failures (and moves them under "real" test class package, not failing/).

cowtowncoder commented 7 months ago

@arthurscchan Ok, modified things a bit. So:

  1. Yes, all 3 failures are observable if removing Woodstox dependency
  2. I added back variation of one of fixes you suggested, just for SJSXP case: I think that makes sense
  3. Will try to see how to address 3rd case (first test): it's bit trickier as I would want to isolate the work-around further
cowtowncoder commented 7 months ago

@arthurscchan Ok, was able to change things to catch all 3 SJSXP-dependant failure cases. So I think this PR could be read to go -- all I need is the CLA I mentioned earlier.

One downside is that I don't know of a way to easily verify this with Github Actions (CI) for the repo: it can be tested locally by temporarily commenting out Woodstox-dependency, but that would then prevent tests over Woodstox. I'm sure there must be a way to do this (optional dependencies) with Maven (probably with profiles) but for now this will have to do.

EDIT: Actually, locally commenting out one reference to Woodstox, leaving dependency out, leads to about 25% unit test failure rate:

Tests run: 360, Failures: 96, Errors: 39, Skipped: 0

so it's probably not something to enable any time soon.

arthurscchan commented 7 months ago

Hi @cowtowncoder, thanks for the suggestion and fixes. Sorry, I left them out yesterday night and only have time to handle them now and I see you already fixed them. Yes, I do agree that the problem happened because the OSS-Fuzz setting has no Woodstox dependencies thus it is using the "default" SJSXP. But I think it also points out that if the user of this Jackson library is not aware of this "requirement", it is possible that those problems that happened to SJSXP do throw up to their applications and become "unexpected" exceptions. But adding support to SJSXP may be digging into a rabbit hole, thus maybe adding some description in the documentation will be a better alternative.

BTW, I have sent the CLA this morning. Please do let me know if more actions or comments are needed from me on the CLA and those fixes PRs. Thanks.

cowtowncoder commented 7 months ago

@arthurscchan No problem at all wrt work -- your baseline was good for figuring out problems.

I concur with your assessment wrt SJSXP as well: ideally we would want to support it to degree possible, and try to avoid most drastic problems (like infinite loops!). But there are limits to how hard to go, given existing bugs (fun fact: I provided half a dozen of bug fixes years ago for pretty basic bugs, when project was open source).

So, I think with CLA I can go ahead and merge this fix.

But one question I have wrt OSS-Fuzz is, whether set up should include Woodstox as a dependency, and if so, when to change. I think there might be value running against SJSXP for little bit more, but for longer term I think Woodstox -- dependency that exists in Maven pom.xml, so users really should be getting it if using Maven or Gradle, unless they explicitly block it, is what we want to test against.

Is this change (to oss-fuzz project set up) something you could help with? If source build is needed, it's from here:

https://github.com/FasterXML/woodstox/

and default branch should be fine.

arthurscchan commented 7 months ago

@cowtowncoder I could definitely help on the OSS-Fuzz project set up, I will investigate on the changes. Thanks again for your comments and changes.

cowtowncoder commented 7 months ago

@arthurscchan Thank you for digging deep and figuring out these Fuzz failures! This is very valuable and increases value of fuzzing a lot. NPEs, for collection datatypes, for example, are great to catch. And of course edge conditions for format codecs too.

cowtowncoder commented 7 months ago

On jackson-dataformat-xml, I wonder what changed tho -- I don't think it used to default to SJSXP. Or maybe fuzzing set up was changed to do different kinds of operations.

cowtowncoder commented 6 months ago

Ok, OSS-Fuzz marked issues as closed. Yay!