FasterXML / woodstox

The gold standard Stax XML API implementation. Now at Github.
Apache License 2.0
220 stars 81 forks source link

add recursion limit of 500 for DTD parsing #159

Closed pjfanning closed 1 year ago

pjfanning commented 1 year ago

Relates to https://github.com/FasterXML/woodstox/issues/157

It's DTD parsing where the recursion happens and who really wants to parse DTDs any more, let alone deeply nested ones? So it seems adequate to just set a recursion limit instead of trying to rewrite the code not to recurse (and the extensive rewrite that would need).

With testing, I've tried a few of the files that supposedly cause the recursion issue but I all I get is UTF-8 parsing issues - so the code never even really gets to the recursion.

I modified the first XML file in ...5219006592450560.zip and removed the UTF-8 issues - this has given me a test case that does fail with the recursion limit kicking in.

cowtowncoder commented 1 year ago

@pjfanning Excellent, thank you for providing this!

The only thing I was wondering is whether this could be made configurable? I know 500 should typically be enough for real usage so maybe it's not a big deal -- and/or could add configurability if someone actually hits the limit -- but then again other processing limits are settable so there should be an existing pattern to follow. Although it may be (I didn't check the code) that perhaps config object is not being passed to validator.

Another question: if it's easy enough to rebase, could you rebase this against 5.3 branch? If not, not a big deal, but if trivially simple I might want to release 5.4.0 for any users that might still use that.

pjfanning commented 1 year ago

@cowtowncoder I changed the target branch to 5.3.

I added a static FullDTDReader.setDtdRecursionDepthLimit(final int limit)

cowtowncoder commented 1 year ago

Ok looks good: I will change configuration to use ReaderConfig class but implementation should be fine.

cowtowncoder commented 1 year ago

@pjfanning Ok, created #160 as the issue for PR, and changed configuration to be applied similar to all other Woodstox-specific properties (via XMLInputFactory.setProperty()).

Planning to release 5.4.0 and 6.4.0 (minor version upgrade since it's an API change even if minor (I guess one could argue that adding a property could be done in patch but whatever)).