NightOwl888 / J2N

Java-like Components for .NET
Apache License 2.0
51 stars 9 forks source link

Task: Create lower-level implementation of LinkedHashSet<T> #95

Open NightOwl888 opened 3 months ago

NightOwl888 commented 3 months ago

Our current LinkedHashSet<T> implementation uses a quirk of the HashSet<T> design to produce the correct ordering, but it involves expensive allocations when an item is removed followed by an add or insert. While this is somewhat unlikely in production, these allocations can add up and result in additional undesired GC overhead.

.NET 9.0 now includes an OrderedDictionary<TKey, TValue> that should provide the correct ordering functionality. It is MIT licensed, so there is no issue with copying it.

I believe we can just bring over the implementation of the hash code calculation and change the entries array from a KeyValuePair<TKey, TValue> to type T. But we need to make sure that a null value for T is supported.

We should keep the same API members (in the same order in the source code) that we have currently, but replace the implementation to use the new arrays instead of cascading the call to HashSet<T>. See the J2N and Microsoft implementations of HashSet<T> to get an idea of how each member can be implemented. We should be keeping all of the same optimizations that Microsoft has for these methods (looking at the latest .NET 8.0 implementation).

We already have tests in J2N.Tests.xUnit to confirm it behaves as expected, we just need to create an implementation that passes the tests.

NightOwl888 commented 1 month ago

The current thinking on this is to name the class derived from OrderedDictionary<TKey, TValue> as OrderedHashSet<T>. "Linked" refers to a specific implementation that uses linked lists (fragmented memory), and OrderedDictionary<TKey, TValue> instead uses blocks of adjacent memory elements. There are pros and cons to each approach, and both might be a reasonable replacement for LinkedHashSet<T> in different use cases. We could probably port the data structure of LinkedHashSet<T> from Apache Harmony later, but I think we should include the OrderedHashSet<T> based on OrderedDictionary<TKey, TValue> first since it is optimized for use in .NET.