Support custom resolvers as XXE solution

carlosame commented 2 years ago

I attempted to set up CodeQL in the EchoSVG project (css4j/echosvg#37) but found a false positive claiming a critical Java XXE vulnerability in:

https://github.com/css4j/echosvg/blob/f79f0b9e201ba927745d1645ead1879c8f89e981/echosvg-dom/src/main/java/io/sf/carte/echosvg/dom/util/SAXDocumentFactory.java#L463-L470

The code uses FEATURE_SECURE_PROCESSING (as well as other configurations) together with a custom resolver that, by default, is configured to not retrieve remote DTDs. The approach is described here:

https://css4j.github.io/resolver.html

And that's similar to the resolver approach that OWASP describes, but instead of a no-op it is using a preloaded subset of safe DTDs. That's consistent with SonarQube S2755:

And use an entity resolver (and optionally an XML Catalog) to resolve only trusted entities.

Given that CodeQL uses OWASP as a guide for the identification of security vulnerabilities, it would seem reasonable to have a way to avoid false positives when a custom EntityResolver is being used.

tausbn commented 2 years ago

Hello and thanks for your report!

If I understand your description correctly, this is actually a false positive, so I have changed the label accordingly, and I will bring it to the attention of the team in charge of the CodeQL Java analysis.

carlosame commented 2 years ago

If I understand your description correctly, this is actually a false positive, so I have changed the label accordingly

Yes, the "question" label was somehow put automatically after posting the issue.

and I will bring it to the attention of the team in charge of the CodeQL Java analysis.

Thank you.

smowton commented 2 years ago

Very likely we wouldn't be able to analyse the difference between a safe / no-op / restricted entity resolver and a resolver that can still be dangerous. I've passed your query to the security lab for their opinion on whether we'd see more false positives or negatives if we considered setEntityResolver to be always sanitizing.

carlosame commented 2 years ago

Very likely we wouldn't be able to analyse the difference between a safe / no-op / restricted entity resolver and a resolver that can still be dangerous.

I'm not familiar with CodeQL's analytical capabilities, but if it can determine that the resolver never returns null, that would be a good indication that the resolver may be safe.

I've passed your query to the security lab for their opinion on whether we'd see more false positives or negatives if we considered setEntityResolver to be always sanitizing.

Unfortunately there are quite a few cases out there where setEntityResolver is used with an unsafe resolver (because the other resolvers that I've seen can return null).

If the never-null check that I mention above could be done, then when such a resolver was set with setEntityResolver and FEATURE_SECURE_PROCESSING was also set, that would probably justify removing the warning or at least decreasing the severity.

EDIT: the main security issue with resolvers is when EntityResolver.resolveEntity returns null and systemId is not null. If systemId is null, the parser has no URL to open a connection to, so returning null does not matter.

smowton commented 2 years ago

Thanks for the constructive suggestion -- sounds like a workable solution, though a nontrivial one to implement and backlog. We'll take a look in more detail, but likely this won't be an immediate high priority, so in the meantime I'd recommend dismissing the false positive once you're sure it is indeed false.

carlosame commented 2 years ago

I'd recommend dismissing the false positive once you're sure it is indeed false.

Yes it is a false positive. Unfortunately there are also 29 "high severity" claimed vulnerabilities that are benign integer or floating point overflows. For example, the following is a high severity vulnerability according to CodeQL:

ang += Math.PI;

where ang is a float.

I can't cope with 30 false positives in a single project so I don't think that I'll use CodeQL there. (I have other projects where it can bring a value though, and the effort of the CodeQL team is appreciated)

smowton commented 2 years ago

I see a few of those overflow warnings which we could remove because the RHS is a small constant (e.g., pi). However about 25 of the 30 really are potentially large doubles, and generally we don't know that the overflow is benign, so we'd intend to warn about these in general.

I'll ask someone from the code scanning team to advise how to switch off the information-losing-cast warning if that would be helpful for this project.

smowton commented 2 years ago

OK, got this working:

https://github.com/smowton/echosvg/blob/master/.github/workflows/codeql-analysis.yml#L48 adds a custom config file,

https://github.com/smowton/echosvg/blob/master/.github/workflows/codeql/codeql-config.yml which says "disregard the default queries, and instead use...

https://github.com/smowton/echosvg/blob/master/.github/workflows/codeql/custom-queries.qls which disables the overflow query.

With regard to security relevance: I'd look out for overflows that could lead to non-termination of loops for example, which could present an avenue to DoS a server running your software. However if you want to go down the route of muting this particular query (or queries), this example should do the trick.

carlosame commented 2 years ago

OK, got this working:

Thank you, I pulled your files to master.

With regard to security relevance: I'd look out for overflows that could lead to non-termination of loops for example, which could present an avenue to DoS a server running your software.

Thanks for the advice. Several of the overflows do not really look exploitable (for example animations whose loops are based on timing functions and not on the quantities being overflown). A few of the overflows are less clear though, and for example I would be less confident about the possible exploitability of anything involving the long-deprecated SVG fonts (that only Inkscape, EchoSVG and Batik still support).

A full analysis done by a security researcher perhaps could find something exploitable, maybe, the bad news are that there are already known ways to perform a full-blown DoS on a server that uses EchoSVG or Batik for SVG transcoding, and perhaps at some point I should try to fix the known ones first...

Development resources in open source projects are limited, you know.

github / codeql

Support custom resolvers as XXE solution #7607