apache / accumulo-access

Apache Accumulo Access Control Library
https://accumulo.apache.org
Apache License 2.0
4 stars 11 forks source link

Authorizations set view could be sorted #58

Open ctubbsii opened 9 months ago

ctubbsii commented 9 months ago

Currently, Authorizations is implemented as an unordered set (Set.copyOf(authorizations)). It also has some unnecessary copying (Authorizations.of(Collection<String>) results in two separate calls to Set.copyOf, one of which is redundant). When calling asSet to view the result, the returned set could be in any order.

It would be nice to make some improvements, so that a particular set of strings had a canonical view. This may also have implications for hashCode and equals if the implementation stores them in different orderings, but I'm not certain about that. It's also just nice to have a sorted view of the authorizations when calling asSet or toString.

keith-turner commented 8 months ago

It also has some unnecessary copying (Authorizations.of(Collection) results in two separate calls to Set.copyOf, one of which is redundant).

The following is from the Java 11 javadoc for Set.copyOf()

If the given Collection is an unmodifiable Set, calling copyOf will generally not create a copy.

This means that when something produced by Set.of or Set.copyOf is passed to Set.copyOf it will not make another copy. So doing Set.copyOf(Set.copyOf(source)) is not a performance issues. It would be ok to remove the redundant calls, but they are not causing a problem.

This may also have implications for hashCode and equals if the implementation stores them in different orderings, but I'm not certain about that.

The javadoc for Set clearly defines how equals and hashCode should work for any implementation so that any implementation can be compared to another. So there is no need to have a sorted set for equals and hashcode to work.

It's also just nice to have a sorted view of the authorizations when calling asSet or toString.

The to string impl could sort when it constructs the string w/o the rest of the class having to use a sorted set. The sorted set introduces a O(log2(N)) lookup time vs the O(1) for a hashset. IMO it would be best to continue to use what Set.copyOf returns as its an immutable hashset that could be optimized for the exact data since its immutable.