OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
833 stars 210 forks source link

Shim Java 10 collections APIs #328

Closed mikesamuel closed 3 months ago

mikesamuel commented 3 months ago

This commit adds Maven <modules> to the parent POM and two small sub-modules: java8-shim and java10-shim.

Java8-shim defines an interface with methods like .listCopyOf corresponding to Java 10's java.util.List.copyOf.

On Java 8-9, those APIs delegate to a homegrown equivalent. On Java 10 and later, those APIs just call out to the standard library functions.

I did the following experiment to see how well this JIT inlines. The trace far below is derived by adding the main method below to Java8Shim and running the command line second from bottom

    @Override public void main(String... argv) {
        List<List<Integer>> ls = new ArrayList<>();
        for (int i = 0; i < 10000000; ++i) {
            ls.add(get().listOf(i, i + 1, i + 2));
        }
    }

Getting JIT diagnostics:

javac --release 8 -d out src/main/java/org/owasp/shim/Java8Shim.java \
    src/main/java/org/owasp/shim/ForJava8.java
javac --release 10 -d out -cp out \
    src/main/java/org/owasp/shim/ForJava9AndLater.java
java -cp out \
    -XX:+UnlockDiagnosticVMOptions \
    -XX:CompileCommand=print,org/owasp/shim/ForJava9AndLater.listOf \
    org.owasp.shim.Java8Shim
Dump with -XX:CompileCommand=print ``` 0x00000001144e9380: ; ImmutableOopMap {rbp=Oop [0]=Oop [8]=Oop } ;*anewarray {reexecute=0 rethrow=0 return_oop=1} ; - java.util.List::of@1 (line 847) ; - org.owasp.shim.ForJava9AndLater::listOf@3 (line 21) 0x00000001144e9380: fc83 9bff | 488b d8e9 | 27ff ffff 0x00000001144e938c: ; {metadata('java/util/ImmutableCollections$ListN')} 0x00000001144e938c: 48be 2807 | 1b44 0100 | 0000 488b 0x00000001144e9398: ; {runtime_call _new_instance_Java} 0x00000001144e9398: eb66 90e8 0x00000001144e939c: ; ImmutableOopMap {rbp=Oop [8]=Oop } ;*new {reexecute=0 rethrow=0 return_oop=1} ; - java.util.ImmutableCollections::listFromTrustedArray@115 (line 220) ; - java.util.List::of@16 (line 847) ; - org.owasp.shim.ForJava9AndLater::listOf@3 (line 21) 0x00000001144e939c: 60ad 9bff | ebac bd01 | 0000 004c | 8bc9 eb05 | bd02 0000 | 004c 890c | 24eb 0633 | ed4c 8914 0x00000001144e93bc: 24be 45ff 0x00000001144e93c0: ; {runtime_call UncommonTrapBlob} 0x00000001144e93c0: ffff 90e8 0x00000001144e93c4: ; ImmutableOopMap {[0]=Oop [8]=Oop } ;*ifnonnull {reexecute=1 rethrow=0 return_oop=0} ; - (reexecute) java.util.Objects::requireNonNull@1 (line 208) ; - java.util.ImmutableCollections::listFromTrustedArray@42 (line 213) ; - java.util.List::of@16 (line 847) ; - org.owasp.shim.ForJava9AndLater::listOf@3 (line 21) 0x00000001144e93c4: 3842 91ff | 488b f0eb | 0348 8bf0 | 4883 c430 0x00000001144e93d4: ; {runtime_call _rethrow_Java} 0x00000001144e93d4: 5de9 26b0 0x00000001144e93d8: ; {internal_word} 0x00000001144e93d8: 9bff 49ba | 6093 4e14 | 0100 0000 | 4d89 9760 0x00000001144e93e8: ; {runtime_call SafepointBlob} 0x00000001144e93e8: 0300 00e9 | 1053 91ff | f4f4 f4f4 | f4f4 f4f4 | f4f4 f4f4 | f4f4 f4f4 [Exception Handler] 0x00000001144e9400: ; {no_reloc} 0x00000001144e9400: e97b f59a | ffe8 0000 | 0000 4883 0x00000001144e940c: ; {runtime_call DeoptimizationBlob} 0x00000001144e940c: 2c24 05e9 | 8c45 91ff | f4f4 f4f4 [/MachCode] ``` iiuc, when there is one implementation of an interface loaded, the interface methods can devirtualize and the below shows that ForJava9AndLater::listOf with 3 arguments inlines to a null check and delegates the rest to `listFromTrustedArray`. ``` 0x00000001144e93c4: ; ImmutableOopMap {[0]=Oop [8]=Oop } ;*ifnonnull {reexecute=1 rethrow=0 return_oop=0} ; - (reexecute) java.util.Objects::requireNonNull@1 (line 208) ; - java.util.ImmutableCollections::listFromTrustedArray@42 (line 213) ; - java.util.List::of@16 (line 847) ; - org.owasp.shim.ForJava9AndLater::listOf@3 (line 21) ```
mikesamuel commented 3 months ago

Obviates #321, #325

csware commented 3 months ago

Why is the code that requires Java 10 named ForJava9AndLater?

kwin commented 3 months ago

As long as you generate bytecode for Java 10 (https://github.com/OWASP/java-html-sanitizer/blob/43089899bae8fae0cb0016c5700beace7ddd26f0/owasp-java-html-sanitizer/pom.xml#L90) one can never run this with Java 8, even if you add shims for the missing API.

rombert commented 3 months ago

@kwin - I checked the files and they look like Java 8 bytecode to me

$ javap -cp out -verbose org.owasp.html.Trie
Classfile /home/robert/.m2/repository/com/googlecode/owasp-java-html-sanitizer/owasp-java-html-sanitizer/20240325.1/out/org/owasp/html/Trie.class
  Last modified Mar 25, 2024; size 5578 bytes
  SHA-256 checksum 01d0ecf39c2490ec1955ed90c692d6a439d64bb5094d56ab2bcabfa59a7efeca
  Compiled from "Trie.java"
final class org.owasp.html.Trie<T extends java.lang.Object> extends java.lang.Object
  minor version: 0
  major version: 52
  flags: (0x0030) ACC_FINAL, ACC_SUPER
  this_class: #19                         // org/owasp/html/Trie
  super_class: #41                        // java/lang/Object
  interfaces: 0, fields: 6, methods: 14, attributes: 3

According to https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-4.html#jvms-4.1 , 52.0 is Java 8.

kwin commented 3 months ago

Ok, I see now that the effective pom contains https://github.com/OWASP/java-html-sanitizer/blob/f729a089b20aef49ed9ffd7ed1c7e207eee71dc5/pom.xml#L198 which takes precedence over source and target. That definitely deserves some cleanup.