GateNLP / gate-core

The GATE Embedded core API and GATE Developer application
GNU Lesser General Public License v3.0
75 stars 29 forks source link

The library contains vulnerabilities #166

Open Mabraygas opened 9 months ago

Mabraygas commented 9 months ago

From https://mvnrepository.com/artifact/uk.ac.gate/gate-core/9.0.1

The latest version of gate-core is depending on com.thoughtworks.xstream 1.4.15, which is vulnerable. Is there any chance to upgrade the version to 1.4.20? (com.thoughtworks.xstream 1.4.20 is vulnerability free)

akkadbakkad89 commented 2 months ago

The vulnerabilities have increased as of June 2024. xstream, jackson-databind, tika-core, commons-compress are using lower than recommended versions

greenwoodma commented 2 months ago

The problem with upgrading XStream is that we would still probably use a tiny blacklist rather than a whitelist as we explicitly allow any Java object to be used as an annotation feature and so need to be able to save/load any type. Unless you are exposing the ability to load an xgapp to the internet you will have control over the xgapp files you load so it's unlikely the XStream vulnerabilities are actually a big issue for GATE, although I agree it would be nice to upgrade so that people don't worry about it when looking at the dependencies.

The other libraries should be easier to update, and I will give those a try when I get the chance. We are open to PRs of course so if anyone else wants to try updating them and running the tests etc. then that would be great.

greenwoodma commented 2 months ago

I've upgraded jackson-databind as that was an easy fix (just a change in the gate-core pom.xml). It looks like commons-compress is pulled in by Tika as part of the support for Microsoft office documents. The problem is that the newer Tika version has a different package structure and so it's not just an easy version number update. Need to look into exactly how to upgrade to the latest version properly.

greenwoodma commented 2 months ago

Also note that the current development version does actually use XStream 1.4.20 (and has done since last November), but we've just not done a release since then.

You can grab the latest snapshot build (which includes both the latest XStream and jackson-databind) from https://gate.ac.uk/download/#snapshots

akkadbakkad89 commented 1 month ago

Thanks for the heads up! Although, I cannot find any snapshot build, the link point to version 9.0.0 but I am using 9.0.1 so I think that is the latest build. I did a workaround to build my project jar pom.xml to exclude bundled vulnerable libraries and install the updated versions

uk.ac.gate gate-core 9.0.1 com.thoughtworks.xstream xstream com.fasterxml.jackson.core jackson-databind org.apache.poi poi-ooxml-schemas org.apache.tika tika-core org.apache.poi poi-ooxml-schemas 4.1.2 org.apache.commons commons-compress org.apache.commons commons-compress 1.26.0 org.apache.tika tika-core 1.28.5 com.thoughtworks.xstream xstream 1.4.20

This resolved the vulnerability check but has introduced runtime errors when I run GATE application

gate.FeatureMap is available via both the system classpath and a plugin; the plugin (file:/Users/application/GATE-Pipeline/plugins/GateDependencyModifierPlugin/) classes will be ignored gate.creole.ExecutionException: java.util.concurrent.ExecutionException: java.lang.ClassCastException: class gate.stanford.DependencyRelation cannot be cast to class gate.stanford.DependencyRelation (gate.stanford.DependencyRelation is in unnamed module of loader gate.util.GateClassLoader @4993b0; gate.stanford.DependencyRelation is in unnamed module of loader gate.util.GateClassLoader @61c77807)

I believe this has to do with same class files getting imported multiple times from jar.

greenwoodma commented 1 month ago

We don't push SNAPSHOT builds to maven central, but we do publish them into our own repo: http://repo.gate.ac.uk/content/groups/public/ if you want to reference them in your own pom.xml. If you include that repo in your build then you should be able to use 9.1-SNAPSHOT of gate-core without needing to mess with it's dependencies. The link I gave in the previous comment was to the binary distributions for the full GATE Developer download rather than just the mavwen artifacts.

As to the errors you get I can see two separate issues.

  1. I don't know where the plugin GateDependencyModifierPlugin comes from but it looks like it has a compile time dependency on gate-core which it shouldn't have. That means you are getting all of gate-core pulled in as dependencies of that plugin -- you might even be getting vulnerable transitive dependencies pulled in at runtime that way as well. You'll need to get the plugin maintainer to fix that plugin to avoiud those problems.
  2. This suggests that you have two versions of Stanford CoreNLP being pulled in via different plugins. Because the different versions aren't API compatible you are ending up mixing classes from both versions which causes failures. Not sure if this is related to problems with the plugin described in (1).

Best thing would be to use our repo and pull 9.1-SNAPSHOT and see if that works, and then see if some/all of the errors go away -- although I think that plugin still needs fixing.

greenwoodma commented 1 month ago

Also from a very quick test, if you upgrade tika-core to 1.28.5 you'll find you've broken the ability to open documents that require it (looks like an incompatibility with the version of org.apache.poi). Not sure what, if anything else, might also be broken by using incompatible versions of the libraries we depend on.

greenwoodma commented 1 month ago

I've just pushed an updated that uses newer tika-core, poi, and commons-io and, more importantly, passes all our tests. It's already been built by the github action and is in our snapshot repo, so that should fix most of the issues now. If you could give it a try and see what you think as any feedback would be most appreciated.

akkadbakkad89 commented 1 month ago

Sure! I would check this in a while and keep you posted, I can see you updated poi to 5.2.2 but it uses commons-compress 1.21 which is vulnerable, will upgrading that to 1.26.0 break anything?

greenwoodma commented 1 month ago

looks like there is no issue with using commons-compress 1.26.2 (all the poi based tests pass anyway) so I've done the exclude and then depend explicitly inside the gate-core pom. It's currently rebuilding but there should be a new snapshot version in our repo shortly.

akkadbakkad89 commented 1 month ago

The updated SNAPSHOT build shows no vulnerabilities, so it's good. I am still facing java.lang.ClassCastException even though I am pulling same versions of Stanford corenlp 8.5.1 and gate-core 9.1-SNAPSHOT at different places. This was not an issue with previous builds. I suspect this might be due to different java version mentioned in my pom.xml and different version used to export the runnable jars

ianroberts commented 1 month ago

When you get a “cannot cast X to X” error it means you have the same class in two different classloaders. You either have two different versions of the GATE Stanford plugin being loaded by the same app and then something like a JAPE grammar trying to refer to the DependencyRelation class, or you have a compile scope dependency from your own plugin on the Stanford one.

If your plugin requires classes from the Stanford plugin then the correct way to do it is to declare the dependency (from your plugin on the Stanford one) as provided rather than compile and also add a cross-plugin dependency in creole.xml. This should allow your plugin to reference gate.stanford.DependencyRelation at compile time but not get the class cast exception at runtime.