Zhentar commented 6 years ago

API Proposal for #2406

Rationale

The BCL Dictionary<,> class serves fairly well as a "safe default" hashmap implementation. But that flexibility and resistance to misuse comes at a substantial cost to best case performance, and it is further weighed down by the need to maintain backwards compatibility with decisions that, after 15+ years of hindsight, seem rather less than ideal. The performance ceiling for even naive hash map implementations is easily several times faster than Dictionary<K,V>; scenarios that demand high performance need better performing collections and will gladly tolerate fewer guard rails to get them.

Goals

Performance. All of the performance. "Roll your own" should not be an easy performance win, with benefits too substantial too ignore.
- No mandatory virtual dispatch on hot paths. Put the code size vs. abstraction trade-off in the hands of consumers.
- No modulo prime bucket mapping (simple bitmasks & power of 2 sizes)
- No separate hash code & key storage & comparison by default - (int, TKey) tuple keys can be used in the rare scenarios where it does make sense
- Skip the versioning. Trust me, I know what I'm doing... or I'm at least willing to accept unpredictable runtime behavior if I screw up in exchange for that extra bit of performance.
- Maximize cache friendliness for large maps (minimal per-entry overhead costs - use open addressing scheme that gracefully handles high capacity factor, e.g. robin hood or hopscotch)
Pay for play AOT compilation - minimal API surface area, minimal supporting types; leave functionality to extension methods where possible.
Support specialized hash maps (e.g. incrementers) without needing to replicate hash algorithm implementations.
Safe external API - keep dangerous & unsafe behaviors protected so that library builders (such as myself) can safely & confidently hand these out to consumers without copying.

Proposed API

Base hashmap class. No public members - support a quality hashmap implementation to serve as the base for a wide variety of specialized collections, both power collections library provided and consumer implemented.

public abstract class BaseDictionary<TKey, TValue>
{
  protected BaseDictionary(int initialSize);

  protected interface IKeyHandler<TKey>
  {
    uint MapKeyToBucket(TKey key);
    bool KeysEqual(TKey lhs, TKey rhs);
  }

  protected struct EquatableKeyHandler : IKeyHandler<TKey>
  {
    public uint MapKeyToBucket(TKey key) => (uint)EqualityComparer<TKey>.Default.GetHashCode(key);
    public bool KeysEqual(TKey lhs, TKey rhs) => EqualityComparer<TKey>.Default.Equals(lhs, rhs);
  }
  protected uint EntryCount { get; private set; }

  protected void Initialize(int powerOf2Size);
  protected void RemoveAll();
  protected bool RemoveEntry<THandler>(TKey key, THandler keyHandler) where THandler : IKeyHandler<TKey>;
  protected ref TValue FindEntry<THandler>(TKey key, bool insertIfNotFound, out bool found, THandler keyHandler) where THandler : IKeyHandler<TKey>;
  private void Resize<THandler>(THandler keyHandler) where THandler : IKeyHandler<TKey>;

  public struct Enumerator : IEnumerator<KeyValuePair<TKey, TValue>> { /*Enumerator Stuff */ }
}

Basic Dictionary implementation, except faster & with single lookup add & update support

public class BoringButFastDictionary<TKey, TValue> : BaseDictionary<TKey, TValue>, IDictionary<TKey, TValue> where TKey : IEquatable<TKey>
{
  public TValue AddOrUpdate(TKey key, Func<TKey, TValue> addValueFactory, Func<TKey, TValue, TValue> updateValueFactory);

  public TValue GetOrAdd(TKey key, Func<TKey, TValue> valueFactory);

  //Everything IDictionary<,> demands
}

Counting Dictionary

public class CountingDictionary<TKey> : BaseDictionary<TKey, int>, IEnumerable<KeyValuePair<TKey, int>> where TKey : IEquatable<TKey>
{
  public Enumerator GetEnumerator();
  public int Increment(TKey key);
  public int Decrement(TKey key);

  public int this[TKey key] { get; set; }
}

TransformationCache - where you lookup up transformed values using the transformation input (e.g. a span of UTF8 bytes looking up strings)

public class TransformationCache<TKey, TValue> : BaseDictionary<TKey, TValue> where TKey : IEquatable<TKey>
{
    public interface ITransformHandler<in TSource>
    {
        TKey KeyForValue(TSource value);
        TValue Transform(TSource source);
    }

    public TValue GetOrAdd<TSource, TTransformHandler>(TSource sourceValue, TTransformHandler transformer) where TTransformHandler : ITransformHandler<TSource>;

}

Indexed Set - support indexed lookup, where the key is derived from the value (e.g. msbuild PropertyDictionary)

\\TODO

Super Basic, Broken Prototype

This prototype is provided as a comprehension aid & basis for discussion; it is entirely untested and has rather substantial deficiencies. https://gist.github.com/Zhentar/eac2d9078860c29c58575e04fbe1deca

Open Questions

Initialization & Resizing - I know there are plenty of very good reasons to want something other than the basic "allocate one contiguous array" backing scheme. But I haven't given this much consideration because it isn't important in my personal use cases. How can alternative strategies be supported without overhead on lookups? Is there a better scheme to use by default?
Is it a good idea to have a fully functional IDictionary<K,V> implementation? or should it be read only given less robust handling of poor implementations?
- Make it a constructor parameter?
Value indexes. I haven't thought through any implementation yet, but PropertyDictionary reminded me that it's not an uncommon scenario (I've done it plenty of times myself), and would be easily supported
Is there a good way to keep a safe, clean public API for GetOrAdd & AddOrUpdate without delegate allocations?

danmoseley commented 6 years ago

Thanks for kicking this off.

API

No mandatory virtual dispatch on hot paths. Put the code size vs. abstraction trade-off in the hands of consumers.

I assume you mean on the comparers, because you made FindEntry and RemoveEntry virtual. For the comparer, you are calling through an interface. I am not sure of the relative cost of that vs. a delegate or virtual call (@jkotas?). If it's non trivial one could imagine a dictionary specialized specifically for, say, StringComparer.OrdinalIgnoreCase. As an alternative to inheritance, partial classes might be sufficient to create specializations like this. There is always a tradeoff between customization and performance, I suggest strong bias here to performance rather than trying to find another balance different to Dictionary<K,V>.

As a general point ideally the API would be as similar as possible to the existing Dictionary<K, V> to make it easier for existing code to use. There should be good reason for divergence.

Performance

All that matters for performance is actual measurements (especially against Dictionary<K,V>) but it would be nice to pick a reasonable reference implementatoin to measure, based on data others have gathered already. There is plenty. I found this an interesting performance overview of collision, growth, and layout strategies: https://bigdata.uni-saarland.de/publications/p249-richter.pdf (It also compares hash functions, which are fixed for our work here)

For collision - chaining, linear probing, quadratic probing, robin hood, cuckoo etc - it finds robin hood fastest or near fastest for lookup heavy work, and competitive for read/write. Footprint seems comparable for all probing strategies. For robin hood, it discarded various probing heuristics in favor of linear search, and rehash instead of tombstones (I am not sure whether this is the same as this).

For growth - if I read correctly (Fig 5), performance at 90% capacity does not seem much worse than 70% so perhaps resizing can be delayed a little longer than the 72% that System.Collections.Hashtable used.

For layout, your prototype has key and value together (AoS). Since we are getting a chance to modernize,, it would be nice to consider possible SIMD friendliness - the paper suggests SoA is more friendly because the keys are already packed. Without SIMD, they are similar except at low load factors where AoS wins.

No modulo prime bucket mapping (simple bitmasks & power of 2 sizes)

Any particular reason? Some collision resolution strategies require a primal size, right? And growth strategy should be based on measurements.

We should probably use ArrayPool - although if the arrays are of structs specific to this implementation, it would need its own pool.

Depending on the real world importance of excess space and the cost of expansion, it may be useful to be able to specify initial capacity, possibly to offer EnsureCapacity() and TrimExcess() (as Dictionary<K,V> now has). Hashtable also accepts a load factor hint - that seems a bit like "throwing up hands".

Performance scenarios

It would be helpful to create (or reuse) some benchmark scenarios, bearing in mind -

How important is lookup vs. insertion vs remove? My guess for typical .NET use is that although there are plenty of static lookup tables, dictionaries are very often dynamic
How often are lookups successful?
What are the ranges of typical entry counts/sizes of real world dictionaries?
What are the typical specializations of Dictionary<K,V> and with what comparers? It may be interesting to optimize for the most common K's - I would guess string is top.

For the last two we can probably dig up some statistical data based on a corpus we have.

@benaadams @jkotas @vancem for thoughts.

danmoseley commented 6 years ago

Is it a good idea to have a fully functional IDictionary<K,V> implementation? or should it be read only given less robust handling of poor implementations?

I think the most common case by far is to use a dictionary internally rather than pass it to 3rd party API. Also as a datapoint in a quick grep of corefx\src*\ref**cs, not many API accept IDictionary<K,V>, especially if you exclude those that are just initializing themselves. So perhaps you can ignore the interface for now.

Value indexes. I haven't thought through any implementation yet, but PropertyDictionary reminded me that it's not an uncommon scenario (I've done it plenty of times myself), and would be easily supported

Do you mean a bidirectional map? That seems something to worry about later. (If by PropertyDictionary you mean this, then it is not a bidirectional map, it is a map of objects that know their keys).

I think the way forward is to "modernize" the API of Dictionary<K,V> if it would help performance (eg maybe https://github.com/dotnet/corefx/issues/31598) and then try to evolve to something as fast as possible. Thoughts from others?

Zhentar commented 6 years ago

@danmosemsft

I assume you mean on the comparers, because you made FindEntry and RemoveEntry virtual. For the comparer, you are calling through an interface.

I was not intending for FindEntry and RemoveEntry to be virtual; is there something that implies virtual that I am not aware of?

The comparer interface is applied as a generic type parameter parameter, so that a struct comparer implementation can be used to get the compiler to specialize, de-virtualize, and likely inline as well without too much API compromise.

Any particular reason? Some collision resolution strategies require a primal size, right?

Modulo prime is a pretty good strategy for handling hash keys with poor entropy in the low bits (memory addresses of objects, for example). The problem is it it requires a DIV instruction with a best case latency of several dozen cycles, which is an unwelcome price to pay when you've already got good low bit entropy. A less resilient but faster entropy mixing function like multiplying by 2654435769 (2^32/phi) can be used, or left to the consumer.

Do you mean a bidirectional map? That seems something to worry about later. (If by PropertyDictionary you mean this, then it is not a bidirectional map, it is a map of objects that know their keys).

I mean a map of objects that know their keys (which at least in my use cases means a set of values/objects with an "index" of some trait for fast lookup).

I would like a bidirectional map too, but I think it would have to be a different structure.

danmoseley commented 6 years ago

@safern

dotnet / corefxlab

Power Collections - Specialized Hashmaps API Proposal #2415

Rationale

Goals

Proposed API

Super Basic, Broken Prototype

Open Questions

API

Performance

Performance scenarios