OWASP / java-html-sanitizer

Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
Other
834 stars 209 forks source link

Remove Guava dependency and update to Java 9 #272

Closed claudioweiler closed 6 months ago

claudioweiler commented 1 year ago

This PR remove Guava dependency and update Java target/source to 9.

Please note that:

  1. ~Two annotations from Guava are copied to source:~
    • ~@GwtCompatible: need only for @VisibleForTesting.~
    • ~@VisibleForTesting: used in 2 internal methods that should be visible only for test scope. Maybe there are a better approach.~
  2. ~Replaced Guava Preconditions with assert. I'm sure there are a better approach, but assert is already used in some places, so I stick with that.~
  3. ~Java sets do not ensures order, so unit testing eventually fail in HtmlPolicyBuilderTest when checking ordered noopener noreferrer, then just run again. Maybe review unit test.~
  4. Added configuration to maven-bundle-plugin because maven-verifier-plugin fails.

Fixes #157, fixes #162 Also fixes rebuilding this library, caused by CVEs found on Guava, with just version update.


Edit

  1. Rebased with last commits;
  2. Guava annotations removed;
  3. Some Set's replaced with List to ensure proper ordering of noopener noreferrer.

Edit 2

Replaced Guava Preconditions with control structures and explicit throws.

eivinhb commented 1 year ago

Since Guava has a security issue now, we are also looking into owasp html sanitizer getting rid of Guava. But releasing in java-html-sanitizer on java 9 is a no go for us. I guess that 8 should be used for some more years since java 8 is LTS and still get security updates. With regards to GwtCompatible and VisibleForTesting I believe that this compatibility should be removed all together. It is way to vendor specific imho.

claudioweiler commented 1 year ago

Understand @eivinhb

I got Java 9 because immutable collections, my preference was to go directly to Java 11.

With Java 8 there is no immutable collections, only unmodifiable collections. So to downgrade my PR to Java 8 will be necessary to check all usages for the real necessity of immutables and use unmodifiable with controlled copies. This is a lot of work to me now, I can do a simple downgrade to Java 8 and trust on unit testing results if this is acceptable. But if you guys can use this PR as reference for a better work it will be awesome too.

About VisibleForTesting annotation, it is used to mark 2 methods, one of then appears to have no need of this annotation, the other one is used in unit testing for sure. I don't messed up with this because I don't deep understand this. Removing this annotation, the other one is just consequence.

eivinhb commented 1 year ago

Yeah. Maby one of the big contributers (@mikesamuel for instance?) can say something about this. I am only a consumer but I can also contribute if we can agree on a direction. :)

eivinhb commented 1 year ago

I do see the point in having proper immutable collections. Maybe use Apache collections4 for that? It has a much lower impact being a smaller more 'do one thing only'-library. That might be a solution until march 31 2025 when java 8 receives no further public updates.

claudioweiler commented 1 year ago

For sure immutable collections have its usage. But immutable collections are needed only when you receive a collection from external usage, as you can't ensure proper collection handling. IMHO, simple handling with unmodifiable and copies when necessary are enough to this library usage, but this need a deep look on all usages to ensure this behavior. I take a quick look at collections4, and can't find immutable collections. Will try to create a simple test case.

talios commented 1 year ago

If the base version is going to move from Java 8, to Java 9 - can we please ensure that this is released as a major breaking version.

mikesamuel commented 1 year ago

Thanks for diving into the codebase, @claudioweiler


Java sets do not ensures order

I thought LinkedHashSet did.

https://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashSet.html

This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order).

Another difference between Guava immutable collections and Java standard library ones is that the Guava ones reject null at construction time, so consumers know they're not going to see null when iterating keys/values/elements. Are there places where we might need additional null checks if we do this?


Two annotations from Guava are copied to source

I'd rather avoid this. Copying files may break JAR sealing. I suspect that, outside Google, no-one uses annotation processing to prevent references from non-test code that don't have private visibility so it only serves as documentation. So one alternative to @VisibleForTesting is // Only visible for testing.


If the base version is going to move from Java 8, to Java 9 - can we please ensure that this is released as a major breaking version.

iirc, the base version is Java 6. I think there's some uses of reflection to paper over the nastiest Java6/Java8 problems.

https://github.com/OWASP/java-html-sanitizer/blob/33d319f876abbb35cb95eddf9705c46bd96822bd/src/main/java/org/owasp/html/AutoCloseableHtmlStreamRenderer.java#L15-L26

I started this project before semver was a thing. I think there was some discussion of the difficulties that caused on #162. Afaict, the only way to a modern versioning scheme would be to transition to a new maven artifact name which would prevent updates from automatically reaching existing users.

Maybe the way to resolve whether to target 8 or 9 or 11 and how to deal with versioning problems would be to post an announcement and send a message to the mailing list like "speak now if you need JRE<=8 support or forever hold your peace" and alert people that an artifact name &| versioning scheme change is coming down the pipe.

mikesamuel commented 1 year ago

That might be a solution until march 31 2025 when java 8 receives no further public updates.

My reading of the JAVA SE LTS Roadmap is that JDK 8 is under support until 2030 at least. Am I missing something?

Oracle Java SE Support Roadmap

Release GA Date Premier Support Until Extended Support Until Sustaining Support
7 (LTS) July 2011 July 2019 July 2022 Indefinite
8 (LTS) March 2014 March 2022 December 2030 Indefinite
mikesamuel commented 1 year ago

I take a quick look at collections4, and can't find immutable collections.

I don't think they use the term "immutable", and I'm not very familiar with Apache Collections, but perhaps @eivinhb was referring to Unmodifiable*

Interface Unmodifiable

All Known Implementing Classes: UnmodifiableBag, UnmodifiableBidiMap, UnmodifiableBoundedCollection, UnmodifiableCollection, UnmodifiableEntrySet, UnmodifiableIterator, UnmodifiableList, UnmodifiableListIterator, UnmodifiableMap, UnmodifiableMapEntry, UnmodifiableMapIterator, UnmodifiableMultiSet, UnmodifiableMultiValuedMap, UnmodifiableNavigableSet, UnmodifiableOrderedBidiMap, UnmodifiableOrderedMap, UnmodifiableOrderedMapIterator, UnmodifiableQueue, UnmodifiableSet, UnmodifiableSortedBag, UnmodifiableSortedBidiMap, UnmodifiableSortedMap, UnmodifiableSortedSet, UnmodifiableTrie

public interface Unmodifiable

Marker interface for collections, maps and iterators that are unmodifiable.

claudioweiler commented 1 year ago

Hi @mikesamuel

I thought LinkedHashSet did.

Yeah, it does. But the original DEFAULT_RELS_ON_TARGETTED_LINKS Set is copied to others Sets and then it looses order. It demands a proper analysis of usages to add LinkedHashSet on all necessary places.

Another difference between Guava immutable collections and Java standard library ones is that the Guava ones reject null at construction time, so consumers know they're not going to see null when iterating keys/values/elements. Are there places where we might need additional null checks if we do this?

Yeah, Java 9 immutable disallows nulls. In Java 8 it will need proper handling on this.

I'd rather avoid this. Copying files may break JAR sealing. I suspect that, outside Google, no-one uses annotation processing to prevent references from non-test code that don't have private visibility so it only serves as documentation. So one alternative to @VisibleForTesting is // Only visible for testing.

As I don't know if there is any annotation processing I preferred not to mess with this. If it's ok I can remove this, much better.

I don't think they use the term "immutable", and I'm not very familiar with Apache Collections, but perhaps @eivinhb was referring to Unmodifiable*

The problem is that immutable and unmodifiable are different concepts. Unmodifiable collections do not protect from external changes and already exists on Java 8 native libraries.

If goal will be Java 8, then unmodifiable must be used, but with additional protections when creating/copying collections.

eivinhb commented 1 year ago

is that JDK 8 is under support until 2030 at least

I believe that you need to pay for support until 2030. But googling for resources about this is kinda confusing at the moment.

The problem is that immutable and unmodifiable are different concepts. Unmodifiable collections do not protect from external changes and already exists on Java 8 native libraries.

Yeah, I realise that now. Maybe java 11 is the way to go then, or even java 17 since many major libraries like Spring is doing the huge leap i these days. I guess OWASP need to make some decisions about this. Is it possible with the licences on Guava to copy the code for immutable collections and change the package name but keep the code unchanged? That will not break JAR sealing, true?

claudioweiler commented 1 year ago

Maybe java 11 is the way to go then, or even java 17 since many major libraries like Spring is doing the huge leap i these days.

Java 17 is a huge leap indeed!

I think Java 11 is a good step. But Java 8 is possible too, just need some more work.

Is it possible with the licences on Guava to copy the code for immutable collections and change the package name but keep the code unchanged?

Guava uses Apache License 2.0: https://tldrlegal.com/license/apache-license-2.0-(apache-2.0) I'm not confident here, but I think that it fits in "modify" case. But, I think that this is going to be much more work than just adapting to Java 8.

ThaKarakostas commented 1 year ago

Any update about this pull request.

alinbrici46 commented 1 year ago

Anyone have update on when this will be fixed?

claudioweiler commented 1 year ago

Updated this PR, see first post for details.

melloware commented 1 year ago

In the short term Guava 32.0.0-jre finally fixes the CVE if somone wants to update that and do a release?

claudioweiler commented 1 year ago

Removing Guava is a best way. Guava is a huge umbrella lib, that leads to constantly CVEs being found.

Also, current Guava version used by owasp-java-html-sanitizer is guava:30.1-jre, that is based on Java 8. Why this lib still on Java 6?

melloware commented 1 year ago

Agreed but a short term fix without a lot of regression testing right now would be to update to 32.0.0.jre but I like this PR as the long term fix.

mikesamuel commented 6 months ago

This was a lot of work. Thanks for putting that in.

Is there a stdlib way to do something like Preconditions? I worry that without -ea asserts don't run and it's often better to fail than to return without upholding a security property.

claudioweiler commented 6 months ago

Is there a stdlib way to do something like Preconditions? I worry that without -ea asserts don't run and it's often better to fail than to return without upholding a security property.

The standard is assert. But, IMHO, I would go with simples if...throw IllegalArgumentException... as pointed in guava docs.

I can change all assert points if you agree.

melloware commented 6 months ago

@claudioweiler i think we should go with simple if its cleaner code than all the asserts just my two cents.

claudioweiler commented 6 months ago

Changing my assertions to control structures.

Preconditions.checkArgument > throw IllegalArgumentException
Preconditions.checkState > throw IllegalStateException
Preconditions.checkNull > throw NullPointerException
Preconditions.checkElementIndex > throw IndexOutOfBoundsException

@mikesamuel , there are still asserts in code. Those are original asserts and I choose not to mess with it, but I can do it with some directions.

One example is this one, that looks to override line 643: https://github.com/OWASP/java-html-sanitizer/blob/5b420f983d4593bbd9a3172d284c23dbcad41e7a/src/main/java/org/owasp/html/CssTokens.java#L649

mikesamuel commented 6 months ago

I think fixing up other asserts is probably out of scope for this PR.

mikesamuel commented 6 months ago

Thanks @claudioweiler for the patch, and for keeping with it through my delays.

melloware commented 5 months ago

@mikesamuel is there any chance you can do a release? Our security team still keeps dinging this library as having the Guava vulnerability.