ASSERT-KTH / sorald

Automatic repair system for static analysis warnings from SonarQube's SonarJava, TDSC 2022 http://arxiv.org/pdf/2103.12033
MIT License
90 stars 27 forks source link

Support for Sonar rule 2755 (XML parsers should not be vulnerable to XXE attacks) #194

Open slarse opened 4 years ago

slarse commented 4 years ago

This is related to introducing a processor for XML parsers should not be vulnerable to XXE attacks. It's a rather complicated processor, so I'm introducing gradually. As can be noted, it covers a variety of libraries, but there are only a couple of major variations in usage.

I'm initially targeting only the DocumentBuilderFactory use cases to see if this is even feasible to repair adequately. I've found 6 major variations to what it can look like. Here's an example repair of the simplest case:

+import javax.xml.XMLConstants;
 import org.w3c.dom.Document;
 import org.xml.sax.InputSource;

 import javax.xml.parsers.DocumentBuilder;
 import javax.xml.parsers.DocumentBuilderFactory;

 public class DocumentBuilderNoSecurity {
     public static Document parse(String xmlFile) throws Exception {
         DocumentBuilderFactory df = DocumentBuilderFactory.newInstance();
+        df.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
+        df.setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");
         DocumentBuilder builder = df.newDocumentBuilder();
         return builder.parse(new InputSource(xmlFile));
     }
 }

The cases I've found are:

  1. A local variable of type DocumentBuilderFactory is declared and initialized. See example transformation above.
    • It's possible that the declaration and initialization are separated, but I've never seen that happen, and so it doesn't seem worth it to account for. As such, the current implementation deals with this case well enough.
  2. A local variable of type DocumentBuilder is created by chaining DocumentBuilderFactory.newInstance().newDocumentBuilder().
  3. Chaining everything together in one expression, like so: DocumentBuilderFactory.newInstance().newDocumentBuilder().parse("xmlfile.xml")
  4. Like 1, but instead a field is used.
    • Here, initialization may very well occur in a constructor, and so to be prudent one should attempt to locate any initialization of the field.
  5. Like 2, but instead a field is used.
  6. Like 3, but instead a field is used.

Currently, I've solved case 1 with #191 . It's rather simple and requires only the addition of a few statements. Cases 2-4 require rewriting the existing code. I think the simplest way to do this is to replace the statements with a method invocation. For example, for case 2 and 3, one could substitute DocumentBuilderFactory.newInstance().newDocumentBuilder() for createDocumentBuilder(), which is defined like so.

private static DocumentBuilder createDocumentBuilder() {
    DocumentBuilderFactory df = DocumentBuilderFactory.newInstance();
    df.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
    df.setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");
    return df.newDocumentBuilder();
}

Cases 4-6 can all be solved by creating such a method, and substituting all initializations of the field with an invocation of said method.

slarse commented 4 years ago

After thinking this through over the weekend, I think I may have been overthinking it. It should be fully sufficient to replace every occurrence of DocumentBuilderFactory.newInstance() with createDocumentBuilderFactory(). We don't have to worry about whether or not it's related to fields or not.

slarse commented 4 years ago

As it turns out, the "major variations" outlined in the opening comment in this issue don't really matter. As explained in #201, all violations related to DocumentBuilderFactory.newInstance() can be solved by substituting the call to newInstance() for a call to a method that creates a "safe" factory. With #202, I've shown that it generalizes easily to at least TransformerFactory, and with some small tweaks it show work for most of the other libraries.

Fields, however, don't appear to work. The solution of method invocation substitution applies to fields as well, but Sonar does not flag them as violations, and so we can't detect them.

Here's the roadmap for implementing support, in the order in which the libraries are listed in rule 2755:

monperrus commented 4 years ago

Depends on https://github.com/INRIA/spoon/pull/3702, which will be merged soon.

monperrus commented 3 years ago

FYI, for the big picture about XML and security:

Coordinated disclosure of XML round-trip vulnerabilities in Go’s standard library https://mattermost.com/blog/coordinated-disclosure-go-xml-vulnerabilities/

slarse commented 3 years ago

Thanks @monperrus, I'll have a look