GateNLP / gate-core

The GATE Embedded core API and GATE Developer application
GNU Lesser General Public License v3.0
75 stars 29 forks source link

Would you consider upgrading gate-core from JDOM 1.1.3 to JDOM 2.0.6? #149

Open ejschoen opened 2 years ago

ejschoen commented 2 years ago

While it's not likely to be exploitable, CVE-2021-33813 has been published against JDOM through version 2.0.6, which is the latest version. Although there is not currently a version of JDOM 2 that is provably immune to XXE, at least version 2.0.6 has a security framework similar to XStream's that allows a caller to prevent entity expansion. Upgrading from jdom 1.1.3 to jdom 2.0.6 would put gate-core in a good position to support newer versions of jdom that have the security vulnerability fixed. (There is a new version; however, the maintainers of jdom are no longer able to publish it to the Sonatype repo--which is a long sad saga.). Unfortunately, there are namespace changes between versions 1 and 2 of jdom, so code updates and recompiles are required.

greenwoodma commented 2 years ago

In theory I can't see any reason why we shouldn't move to the latest JDOM assuming someone has the time to do so given the namespace changes. We'd also have to double check exactly where JDOM might be exposed as part of the public API as changing the namespace might also effect plugins compiled against GATE.

ianroberts commented 2 years ago

A cursory search of the GateNLP GitHub namespace suggests there's half a dozen or so of our own plugins that make use of JDOM, plus an unknown number of third party ones. All of these would need an update if gate-core moved, if only to add their own direct dependency on JDOM rather than relying on it as a transitive.

More generally though, this opens up a question about plugins making use of "accidental" transitive dependencies. Recent versions of Gradle split the "compile" scope and make a distinction between api dependencies that form part of the public API of a library (e.g. because they are used as parameters or return types in methods that are part of the published API of the library), and implementation dependencies that are required to be able to compile the library itself but shouldn't be relied on by other code that depends on the library. This distinction can't be enforced when pushing JARs to a Maven repository (they both map to <scope>compile</scope> in the POM) but maybe we should consider documenting more clearly which specific dependencies of gate-core are part of the public API and which aren't?

ejschoen commented 2 years ago

It certainly seems like a good idea to separate api and implementation dependencies. But just because a library isn’t exposed by an API doesn’t guarantee that a vulnerability in the library isn’t exploitable by nefarious and clever means. My concern is having to explain to clients who uncover vulnerabilities by scanning JAR files that there is no path by which untrusted (i.e. XML) data can be fed to JDOM, for instance. I’m just very sensitive to JDOM because it is widely used and has to be mitigated in many libraries that we use.

On a related note, thanks for moving away from Log4j in Gate 9.0. Similar issues there with Log4j 1.x that we had to explain away.

greenwoodma commented 5 months ago

This issues just came up again via the mailing list. Whilst we don't yet have a fix in place it's worth repeating the comment @ianroberts made that

we don't use JDOM when parsing documents, only when loading plugins and applications. As long as you are in control of the plugins and xgapp saved states that your app uses, and thus it's only documents that might come from an untrusted source, then this particular vulnerability should not be directly exploitable via gate-core.