dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.64k stars 4.57k forks source link

Frozen collection follow-ups #77891

Open stephentoub opened 1 year ago

stephentoub commented 1 year ago

https://github.com/dotnet/runtime/pull/77799 adds new FrozenDictionary and FrozenSet types. These are in decent shape but not yet ready to ship. Several follow-ups are required:

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-collections See info in area-owners.md if you want to be subscribed.

Issue Details
https://github.com/dotnet/runtime/pull/77799 adds new FrozenDictionary and FrozenSet types. These are in decent shape but not yet ready to ship. Several follow-ups are required: - [ ] **Construction performance.** The initial use cases for these types was for long-lived processes, where overall it's a better use of resources to spend more time at construction time to optimize the data structures and algorithms used for later data access. However, the current construction time is one or more orders of magnitude more than for the standard dictionary/set types. We need to a) invest in driving down those construction costs, and b) consider making the default performance of ToFrozenDictionary/Set as close that of Dictionary/Set as possible via an additional optional parameter to these methods, e.g. ```diff public static FrozenSet ToFrozenSet(this IEnumerable source, IEqualityComparer? comparer = null, + double constructionOptimization = default // pick a better name ); ``` When `0.0`, this would effectively default to creating a FrozenSet/Dictionary that just wrapped a HashSet/Dictionary, providing the same FrozenSet/Dictionary semantics but primarily just as a wrapper for a HashSet/Dictionary, such that the only additional costs were cloning of the data (assuming the Items/Keys/Values properties remain as they are today, they could be lazily-created on first need). On a scale from 0.0 to 1.0, the larger the double value, the more optimization would be performed at construction time, enabling the caller to choose how much time is spent at construction vs how fast subsequent data access can be made, with 1.0 meaning construction would take the longest in exchange for fastest data access. Having the default be fast construction would make this the least surprising for casual use and would avoid falling into pits of failure in the average case; consumers willing to spend more time at construction because they know their processes will be long-lived can then opt-in to longer startup times. - [ ] **Items/Keys/Values properties**. The current design has Items/Keys/Values properties that return `ImmutableArray<>`. This is to enable access to the data as a `ReadOnlySpan<>`, indexibility, via refs, and with very fast enumeration. This does, however, place a requirement on implementations that they either store the keys/values/items separately and in contiguous memory or that implementations maintain a second copy of the data that does so. If we'd be willing to give up on the `ReadOnlySpan<>` and indexing capabilities, these types should instead be changed to some `FrozenEnumerable<>`-like type that would allow different implementations to vary how the data was stored. - [ ] **Small collections.** There's currently an optimized implementation for empty collections, but not for other small sizes, e.g. <= 8 elements. We should investigate adding simple implementations that don't do any hashing or other complicated storage for small sizes. This includes looking at various code bases / sources of information to determine the most common sizes to be optimized for, e.g. is it worth having a dedicated implementation for collections of just 1 or 2 elements. - [ ] **Other data structures and algorithms.** Would a perfect hashing implementation yield any benefits? Would tries be useful in some situations for strings? Should we consider doing reflection emit code gen in some circumstances? - [ ] **Source generation.** If/when Roslyn source generators support some kind of mechanism that would allow a source generator to look at constant data in the C# file, it would be beneficial if the optimizations could be performed at compile-time, feeding something into an optional parameter to ToFrozenSet/Dictionary to enable them to be fast, with all the work supplied at build time. This would likely also require more work on the API to make the existing types extensible, which is something we've actively avoided today.
Author: stephentoub
Assignees: -
Labels: `area-System.Collections`
Milestone: 8.0.0
stephentoub commented 1 year ago

cc: @geeknoid

geeknoid commented 1 year ago

Perfect hashing ends up being not a great choice. Perfect hashing usually involves a more complex hash function, and you'll end up with lower perf than what we have here. Their main benefit would potentially be in smaller data by not having giant sparse lookup tables.

Another area worth looking into is to further optimize string-based key (the overwhelming favorite kind of key). There are a number of comparers in the current implementation to try and fine-tune how comparison works, but more are possible. For example, comparers that know about 2 character sequences explicitly. The motivation here is to reduce the cost of hashing incoming strings to a minimum.

Perhaps a broader problem that just for the frozen collections, but there is very poor interop between spans and collections. In particular, for string-based dictionaries, it would be terrific to be able to do a lookup using a ReadOnlySpan as a key in addition to a string. This would avoid needing to materialize strings just to do a dictionary lookup. A perfect use for this is a string interning table for incoming REST/gRPC calls. You could go sraight from the incoming buffer, and do a lookup in the dictionary to get the final string object.

stephentoub commented 1 year ago

it would be terrific to be able to do a lookup using a ReadOnlySpan as a key in addition to a string

Yes, this has long been desired, and we currently lack a great answer. This is tracked by https://github.com/dotnet/runtime/issues/27229.

geeknoid commented 1 year ago

Frozen collection state serialization is another interesting possibility.

The idea is to make it possible to serialize the state of a frozen collection to a portable format. A source generator can then generate such serialized blobs to be sucked in by a frozen collection at startup, providing cheap initiatilziation for well-knwon data.

Additionally, said serialized data could actually be passed around in config blobs. So that even state that is config-drivem can have the benefit of low startup time. You serialize the state offline, and send it through the config system. This could in fact provide opportunities for even more aggressive optimizations since startup time would not be an issue.

stephentoub commented 1 year ago

Frozen collection state serialization is another interesting possibility.

I intended the "Source generation" part of what I wrote above to cover this. But we'd also need to be really careful here. It'd be fairly easy to accidentally create a vulnerability.

iSazonov commented 1 year ago

Frozen collection state serialization is another interesting possibility.

I intended the "Source generation" part of what I wrote above to cover this. But we'd also need to be really careful here. It'd be fairly easy to accidentally create a vulnerability.

Does this exclude using perfect hash functions at all?

stephentoub commented 1 year ago

Frozen collection state serialization is another interesting possibility.

I intended the "Source generation" part of what I wrote above to cover this. But we'd also need to be really careful here. It'd be fairly easy to accidentally create a vulnerability.

Does this exclude using perfect hash functions at all?

That statement was unrelated to perfect hashing.

iSazonov commented 1 year ago

I mean GetHash using seeding today but perfect hash no. So question can we use it or no.

iSazonov commented 1 year ago

double constructionOptimization = default

It could be an enum like CompressionLevel. It is more user friendly in public API. Or it could be a dynamic to follow tiered compilation conception - fast startup with follow optimization in background.

stephentoub commented 1 year ago

perfect hash no. So question can we use it or no.

Technically could we? Sure. Should we? That would depend on a variety of factors, not the least of which would be demonstrating it was a perf win. And Martin already commented on his experience with their perf in https://github.com/dotnet/runtime/issues/77891#issuecomment-1303687955.

iSazonov commented 1 year ago

That would depend on a variety of factors, not the least of which would be demonstrating it was a perf win. And Martin already commented on his experience with their perf in #77891 (comment).

I couldn't agree more. A simple example, if we have 256 strings all with different length - perfect hash function can be just the length of the string (and byte lookup table). Is it possible to create something faster?

Complex perfect functions are rather aimed at creating very compact data buffers. If we are not going to store them in the .Net Runtime, we don't need to strive to make them extremely small and hence we can use simpler and faster functions. We could learn the gperf algorithms, if the license allows it. And create an SG in the first step if that is needed of course.

stephentoub commented 1 year ago

A simple example, if we have 256 strings all with different length

The implementation already does that.

omariom commented 1 year ago

Would be very helpful if the dictionary had such indexer

public ref readonly KeyValuePair<TKey, TValue> this[int entryIndex]

// for the set
public ref readonly T this[int entryIndex] 

It would allow to store references to entries as integers and later quickly retrieve them. If returning KeyValuePair imposes too much on the implementation, then may be separate GetValue and GetKey?

The idea is to be able to get indexes of the entries and use them to retrieve keys and values.

stephentoub commented 1 year ago

Would be very helpful if the dictionary had such indexer

What's the scenario for that such that the indexer returning a ref to the value is insufficient?

omariom commented 1 year ago

You mean this indexer? public ref readonly TValue this[TKey key]

I have a couple of scenarios.

  1. Reference data that can be looked up. For example, a list of countries or a list of currencies. The referencing data structure would keep entries' indexes, instead of the keys ("UK" or "GBP"). It would also be more efficient in terms of memory and more GC friendly.

    I implemented it for myself by just copying the existing HashSet and Dictionary and adding the required members.

  2. Another scenario (imaginable in this case) is a network protocol, that at the start of a session sends the initial snapshot of ref data, and later refers to the entries by their indexes instead of the values in the hashset or the keys of the dictionary.

geeknoid commented 1 year ago

@omariom Does this make sense in the context of a frozen dictionary? Seems like you're using a dictionary to build up unique name/value pairs, and then when you're done you ditch the name part and access your values by index. Do you have scenarios that routinely require having both key-based and index-based access to the same values on an on-going basis?

geeknoid commented 1 year ago

@iSazonov For those cases where the input is known statically, a source generator that has a large suite of tricks to generate custom code seems like a great idea. This is not so much focused on the need for perfect hashing, it could be anything that makes the code fast.

In the scenarios where I've used frozen collections so far, the set of items in the collections have always come from config or some other dynamic source, so I haven't had the need/desire for a build-time generator. But for parsers which have a well-known set of tokens, there could be value in a highly optimizing source generator. It all depends on the actual gains you could achieve relative to what you already get with the hard-coded frozen collection implementations.

geeknoid commented 1 year ago

@stephentoub I have not experimented with purely length-based "hashing", which I think is a fascinating idea.

If there is sufficient variance in the length of the keys in the frozen collection, it should be possible to just use the string length as the hash code, completely eliminating the cost of hashing. It'd be trivial to write a new comparer implementation that just uses string length. I bet that could be a considerable win in many cases.

stephentoub commented 1 year ago

I have not experimented with purely length-based "hashing", which I think is a fascinating idea.

I spent some time on it, which is where https://github.com/dotnet/runtime/blob/main/src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/LengthBucketsFrozenDictionary.cs came from. There's of course more opportunity to experiment.

geeknoid commented 1 year ago

I have not experimented with purely length-based "hashing", which I think is a fascinating idea.

I spent some time on it, which is where https://github.com/dotnet/runtime/blob/main/src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/LengthBucketsFrozenDictionary.cs came from. There's of course more opportunity to experiment.

Ah, you're one step ahead. I'm surprised this required a new dictionary type, rather than just a string comparer implementation.

stephentoub commented 1 year ago

It didn't require it, but the comparer itself added overhead. It might still be useful to have the comparer with the frozen hashtable as a fallback if it doesn't meet the criteria for this impl.

geeknoid commented 1 year ago

Here's another idea:

If an integer set is created that contains a constrained range of values, it would be more efficient to just have an array of bits to hold the set's state.

The idea is to leverage this for sets of enums. FrozenSet can then be implemented using simple array lookups rather than hashing.

I just saw an enum-based set in some production code...

AlexRadch commented 1 year ago

I suggested providing optimized read-only collections for SortedSet and SortedDictionary. I made a separate topic for discussing the API and other issues related to the implementation of these collections, so as not to interfere with all discussions on different topics in this one topic.

Timovzl commented 1 year ago

The Adaptive Radix Tree could be suitable for (ordinal) string keys, especially to implement a sorted set.

A trade-off is retaining the actual string keys vs. keeping only their compacted bytes in the tree.

Timovzl commented 1 year ago

The Adaptive Radix Tree

Come to think of it, the adaptive part is unneccessary for frozen collections. Any radix tree would do.

geeknoid commented 1 year ago

@AlexRadch @Timovzl

I think the value in something like SortedSet is that it magically maintains sort order for you as you add/remove items. That magic is not interesting for collections that don't mutate over time. As such, I think the right approach to achieve similar behavior to a sorted collection is to create a normal FrozenSet/Dictionary instance alongside an ImmutableArray with the sorted items/keys (or KeyValuePair). You can then get fast lookup as well as sorted enumeration.

AlexRadch commented 1 year ago

@geeknoid yes, I think we can create FrozenOrdered Set/Dictionary with additional Comparer<> and order items/keys by this comparer. And I think FrozenOrdered Set/Dictionary will support IReadonlyList<> interface and this[int] indexer also.

geeknoid commented 1 year ago

@AlexRadch I was actually suggesting that there doesn't need to be a dedicated sorted set/dictionary since it's not really adding any value. The caller can create a frozen set/dictionary with a parallel sorted ImmutableArray on the side. There's no additional efficiency possible by creating a new type in the framework.

AlexRadch commented 1 year ago

@geeknoid The actual frozen set/dictionary does not have construction with additional order comparison. We can add ToFrozenSet method with orderComparer parameter, but they do not implement IReadonlyList<> interface and this[int] indexer. And I think we can not add it to them.

So to support IReadonlyList<> interface and this[int] indexer it should be separated classes.

geeknoid commented 1 year ago

Let me state my point again: I don't believe the functionality of SortedFrozenSet and SortedFrozenDictionary should be added to the class framework. These are superfluous data structures. Specifically, if you start with an IEnumerable, application code merely needs to do:

var mySet = myEnumerable.ToFrozenSet(); var myArray = mySet.Items.OrderBy().ToImmutableArray();

And there you go, you can now do O(1) lookups by using mySet, and properly sorted enumeration of the items using myArray. There's little value in packaging these two things together in a single data structure.

iSazonov commented 1 year ago

I don't believe the functionality of SortedFrozenSet and SortedFrozenDictionary should be added to the class framework.

I agree if there is a way to implement compile time optimizations to exclude runtime allocations/construction time lost (important for startup scenarios).

AlexRadch commented 1 year ago

@geeknoid if we add this functionality to the framework we get 2 advantages: creation time and memory allocation. The ordered collections can store keys/values in the right order inside so we will not create a second array.

theodorzoulias commented 1 year ago

So to support IReadonlyList<> interface and this[int] indexer it should be separated classes.

@AlexRadch isn't this problematic for a FrozenDictionary<TKey, TValue> with the Tkey being of type int? I would expect compile-time errors because of the ambiguity between

public TValue this[TKey key]

and

public KeyValuePair<TKey, TValue> this[int index]

Maybe the derived classes could implement the IReadOnlyList<KeyValuePair<TKey, TValue>> interface, so casting from FrozenDictionary<> to IReadOnlyList<> would succeed, and the this[int] indexer would be available through the interface. But what is the scenario that the this[int] indexer would be useful? Isn't ordered enumeration enough?

AlexRadch commented 1 year ago

@AlexRadch isn't this problematic for a FrozenDictionary<TKey, TValue> with the TKey being of type int? I would expect compile-time errors because of the ambiguity between Maybe the derived classes could implement the IReadOnlyList<KeyValuePair<TKey, TValue>> interface, so casting from FrozenDictionary<> to IReadOnlyList<> would succeed, and the this[int] indexer would be available through the interface. But what is the scenario that the this[int] indexer would be useful? Isn't ordered enumeration enough?

Yes, you are right, so I think the IList<> interface will be enough. I think Frozen OrderedDictionary should have public ref readonly TValue GetValueRefByIndex(int index) method to get the reference to a value by index.

theodorzoulias commented 1 year ago

@AlexRadch personally I've never used an OrderedDictionary. I don't think that it's a useful data structure. I guess it can be useful in case you want to paginate a dictionary, and fetch efficiently the items 9,901-10,000. That's the only usage scenario that I can think of.

danmoseley commented 1 year ago

Is the scenario of frozen keys, mutable values supported by the new collections? Eg., in this scenario (on Kestrel hot path) a dictionary lookup is basically hand generated (these days maybe Roslyn would generate similar code from a switch statement) and the value may change. https://github.com/dotnet/aspnetcore/blob/41627fe50a7a71e395f2fdfccce9f8cbff10a49a/src/Shared/HttpSys/RequestProcessing/RequestHeaders.Generated.cs#L968

Edit: never mind, it uses fields for the values. But I'm interested in the answer anyway.

stephentoub commented 1 year ago

Is the scenario of frozen keys, mutable values supported by the new collections?

You can't mutate what value the dictionary/set stores for a key. If that value is a reference type, though, or if it contains a reference type, you can of course mutate the referenced instance. In other words, the immutability here is shallow.

danmoseley commented 1 year ago

Yeah in this case it's a struct. OK thanks.

Timovzl commented 1 year ago

@danmoseley I have definitely seen someone use GetValueRefOrNullRef() and mutate a struct value in a frozen dictionary somewhere…

stephentoub commented 1 year ago

I have definitely seen someone use GetValueRefOrNullRef() and mutate a struct value in a frozen dictionary somewhere…

That returns a ref readonly. Using a ref readonly to erroneously mutate the collection requires unsafe code. With unsafe code you can violate almost any kind of constraint, including immutability.

Timovzl commented 1 year ago

Using a ref readonly to erroneously mutate the collection requires unsafe code.

Ahhh, so that was the trick. Indeed, they used a method from the Unsafe class to do so, I now recall. So that was definitely not by design.

mjsabby commented 1 year ago

There is absolutely no requirement for Perfect Hashing to be complex. Take a look at https://gist.github.com/mjsabby/d7baf64d524176ee35f69ffa2c278179 and it out performs current .NET 7 dictionary.

Implements the Hash Displace algorithm.

I would implement Frozen Sets and Dictionaries with perfect hashing.

        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        public static int Lookup(ReadOnlySpan<byte> key, ReadOnlySpan<int> values, ReadOnlySpan<int> seeds)
        {
            ulong hash;

            unsafe
            {
                fixed (byte* ptr = &MemoryMarshal.GetReference(key))
                {
                    hash = Hash64(ptr, key.Length);
                }
            }

            var size = (ulong)values.Length;
            var index = hash & (size - 1);
            var seed = seeds[(int)index];

            if (seed < 0)
            {
                return values[0 - seed - 1];
            }

            var x = (ulong)seed + hash;
            x ^= x >> 12;
            x ^= x << 25;
            x ^= x >> 27;

            index = (x * 0x2545F4914F6CDD1D) & (size - 1);
            return values[(int)index];
        }
YoniFeng commented 1 year ago

Perfect hashing ends up being not a great choice. Perfect hashing usually involves a more complex hash function, and you'll end up with lower perf than what we have here. Their main benefit would potentially be in smaller data by not having giant sparse lookup tables.

@geeknoid I haven't taken the time to look at the Frozen collection's code (would love TL;DR/gist, but I'll have to take a good look regardless), but a PHF(perfect hash function) definitely has value. I'm surprised you seem to dismiss it for poor key lookup performance (did I infer wrong?), I'd think the only reason to avoid a PHF during runtime is the construction time can be very slow.

For those cases where the input is known statically, a source generator that has a large suite of tricks to generate custom code seems like a great idea. This is not so much focused on the need for perfect hashing, it could be anything that makes the code fast.

A source generator for static sets could be fantastic. It can make top-of-the-line lookup perf easily accessible. It would essentially be like an integrated gperf. An example (similar to gperf in concept, not implementation) for an integrated gperf/"source generator" is rust-phf. Its downsides are that it doesn't allow customizing the underlying hash function and comparer, is minimal, and Rust's default hasher is unnecessarily slow because it provides cryptographic guarantees that aren't needed for static sets.

Being able to customize the underlying hash function and key comparer opens the door for super-efficient implementations that make use of the static set's properties and/or assumptions.

About minimal: I've yet to do thorough benchmarking of the PHF area (planning to), but my hunch is having a minimal PHF results in worse performance. It optimizes for space efficiency (binary size / table stored in data segment) over time efficiency (lookup speed), because a non-keyword hitting an empty slot can be rejected faster (in the case of strings - faster or no string comparison). Granted, with a fast hash function (or even something ad-hoc) the distribution probably isn't great - increasing the table size probably tends to 0 for lookup performance improvement.

Is the scenario of frozen keys, mutable values supported by the new collections? Eg., in this scenario (on Kestrel hot path) a dictionary lookup is basically hand generated (these days maybe Roslyn would generate similar code from a switch statement) and the value may change. https://github.com/dotnet/aspnetcore/blob/41627fe50a7a71e395f2fdfccce9f8cbff10a49a/src/Shared/HttpSys/RequestProcessing/RequestHeaders.Generated.cs#L968

Edit: never mind, it uses fields for the values. But I'm interested in the answer anyway.

@danmoseley Can you give a small elaboration on HTTP request keyword/verb matching is used in Kestrel? Why does the value need to be mutated? (Like above - I need to find the time to dig in to the code myself..). I was looking for this the other day, because I wanted to see how Kestrel/ASP.NET match HTTP keywords and/or common request headers.

Edit: Looked at the code. Like Stephen said for the Frozen collections, a source-generated PHF-based solution would also allow for shallow immutability. If the value can be an index in an array - that's probably best.

This the sort of use case I'd expect could see great benefit from a source-generated PHF set/dictionary. Assessing whether a string is a reserved keyword (e.g. in compilers) is exactly the use-case where a compile-time PHF-based structure can shine.

Come to think of it - how does Roslyn do keyword matching?

danmoseley commented 1 year ago

@YoniFeng I’m not familiar with that code. Of course any contributions or analysis you may have are welcome.

rampaa commented 1 year ago

Source generation If/when Roslyn source generators support some kind of mechanism that would allow a source generator to look at constant data in the C# file and augment ToFrozenSet/Dictionary call sites in some manner, it would be beneficial if the optimizations could be performed at compile-time, feeding something into an optional parameter to ToFrozenSet/Dictionary to enable them to be fast, with all the work supplied at build time. This would likely also require more work on the API to make the existing types extensible, which is something we've actively avoided today.

Is this on the radar for .NET 8?

stephentoub commented 1 year ago

Is this on the radar for .NET 8?

No. It requires exposing the details of the abstraction, and we want some bake time on that before we do. It's something we'll consider for .NET 9.

eiriktsarpalis commented 1 year ago

@stephentoub are there any work items tracked in this issue that we're planning to deliver in .NET 8? If not we should update the milestone.

stephentoub commented 1 year ago

Everything for 8 is done

andrewjsaid commented 8 months ago

Another way to deliver on the premise of improving lookup time at a potential one-time cost of construction would be some kind of mechanism which tracks usage patterns. This could be provided either at construction time (e.g. an array of key lookups) or by enabling some kind of sampling to be run to track lookup keys.

The frozen collection would then have a method to re-optimize to speed up lookups based on the usage pattern.

Here are some metrics which would be great to know ahead of time for stringly keyed FrozenDictionaries in order to create more optimal and targeted lookups:

  1. Are most lookups for the same key or small subset of keys? - e.g. a FrozenDictionary where the key is country code and most of the time it's the same country code being looked up. Maybe there could be two dictionaries - a faster one with fewer items and a regular one with the entire set.
  2. Changing the number of buckets to ensure that frequently accessed keys are not in the same bucket.
  3. Rotating the order of keys within buckets to sort by most frequently accessed
  4. Not using max length checks if we know them to be ineffective with that particular data set.
  5. Are we ignoring case unnecessarily most of the time? If so maybe there could be two dictionaries here too.

I'm sure that some of these ideas aren't suitable or are slow to compute, but equally I'm sure there are others I haven't thought of which could be perfect optimization.

I guess the problem with this suggestion is that I'm not entirely sure how FrozenDictionaries are used in the wild, so maybe it's better to have this conversation once some common scenarios exist.