Reimplementation a Hash using open addressing

akzhan commented 7 years ago

Crystal Hash(K, V) should be reimplemented using open addressing and powers of two.

To be read:

Thanks to @RX14, @konovod, @yxhuvud for links.

Refs #2817, #3932, #2331 and https://github.com/crystal-lang/crystal/pull/4033#issuecomment-307779149 to continue discussion and implementation.

funny-falcon commented 7 years ago

Questions to be cleared:

current Hash implementation preserves order of insertion. Should new implementation do the same?

If so, then single valuable design is that is adopted by PHP, Python and Ruby: "open-addressing hash array" consists indexes to entries array. With this design there is no much profit from clever alhorithms like "robin-hood hashing". Correctly implemented double hashing looks to be enough.

should Hash be able to contain more than 2^31 entries? It will cost additional memory and/or complexity.
should index array be compressed for small hashes? Or is it ok to alqays store indices as uint32?
which Key types could live without saved hash value? Obviously, integer types. Is it ok to always store hash for non-integer keys? But even integer keys have to have bitmap for deleted entries.

akzhan commented 7 years ago

We usually follow Ruby behavior in these cases:

"Hashes enumerate their values in the order that the corresponding keys were inserted."

2^31 seems like reasonable limit (but you can propose 2^62 or something else).
Small hashes are common just like a Ruby.
Every object has the hash method that responds to get a hash value. It's OK to use it only. Just to explain: LLVM usually inline Int32.hash etc. calls in release mode because Hash(K, V) is statically typed generic class.

/cc @asterite

RX14 commented 7 years ago

With this design there is no much profit from clever alhorithms like "robin-hood hashing". Correctly implemented double hashing looks to be enough.

Why is this and are there benchmarks to back it up?

funny-falcon commented 7 years ago

Every object has the hash method that responds to get a hash value. It's OK to use it only. Just to explain: LLVM usually inline Int32.hash etc. calls in release mode because Hash(K, V) is statically typed generic class.

@akzhan , but Crystal's current hash functions are just awful for hash table with "power of two". Especially awful Float "identity" functions:

$ crystal eval 'puts 1.0.hash.to_s(16)'
3ff0000000000000
yura@dell:~/Project/postgresql/build$ crystal eval 'puts 2.0.hash.to_s(16)'
4000000000000000

Every hash function is 'ok' if hash table is by prime numbers. But for "power of two" more work should be done.

And, by the way, looks like Crystal is the only contemporary language of "application level" that doesn't fear of HashDos:

neither hash seeded in any way (with global or per-table seed),
nor "strong" hash algorithms used.

Is it intentionally? Is Crystal recommended for public interface, or is it for private tools only?

funny-falcon commented 7 years ago

Why is this and are there benchmarks to back it up?

@RX14 , what cool algorithms (like "robin-hood" hashing) for? RR is for:

cache-locality on lookup
high-density - big load factor
and short lookup chain with two previous property preserved.

With ordered hash table implemented as "indices array + entries array" both first two properties are not preserved:

cache-locality is destroyed by lookup to entries array with index. It could be fixed if hash value is stored alongside with index. But it will cost memory footprint then.
load-factor is dictated by "entries array" reallocation strategy, and doesn't depends much on cleverness of "indices hash" algorithm. (Exception from the is Hash(Int32,Int32) - entries array comparable with indices hash array. But it whould be pity to introduce additional algorithmic complexity just cause of this case).
with low load factor (0.33-0.66) , lookup chain is usually short regardless of algorithm cleverness. And since indices array usually smaller than entries array, it is better to simply reduce maximum load factor. With clever trick, first two lookup are could be made into consecutive cells of indices array even if "double-hashing" is used.
deletion is already have to be handled within entries array (for correct iteration). And rehashing have to be done when entries array is full. So no need in clever deletion algorithm.

I agree that clever algorithms are useful. But it will be useful for UnorderedHash, if some will add it to Crystal. Ordered Hash table destroys much of gain from cleverness.

RX14 commented 7 years ago

I was sure we further hashed the #hash return value with a hash with better distribution. I must be going crazy.

@funny-falcon Why do we need an entries array instead of simply using a doubly linked list inside an Entry struct as we do now? Doesn't this solve most of the problems you listed.

akzhan commented 7 years ago

@funny-falcon

Thanks for deep into FloatXX.hash implementations. It's yet another reason to reimplement Float64.hash using something like _Py_HashDouble. Yes, WIP (introducing of frexp was first step #4560). There is related #3932 issue. Anyway XXX.hash reimplementation is related, but another task.
Hash value must be seeded by Hash class itself. It is not implemented yet, but we have #3932.

akzhan commented 7 years ago

OK, @funny-falcon - just to clarify, what do you recommend to implement following https://github.com/crystal-lang/crystal/issues/4557#issuecomment-308253835?

Anyway object hash functions will be adjusted.

akzhan commented 7 years ago

Powers of two is not a dogma. It's just faster in some scenarios.

Open addressing looks like just fit to modern processors.

asterite commented 7 years ago

@funny-falcon Crystal is a work in progress, totally not production ready. Not a lot of thought (in fact I would say "none") has been given to hash functions, they "just work". Pull requests and discussions are welcome to discuss and enhance this, though I personally know nothing about the subject.

luislavena commented 7 years ago

@asterite can we assume Hash(T) to should not honor ordering? (aka: recent Ruby behavior vs legacy 1.8.x one).

Working with that assumption, several changes can be recommended for Hash(T) without introducing yet another base class.

Thank you.

funny-falcon commented 7 years ago

@RX14

I was sure we further hashed the #hash return value with a hash with better distribution. I must be going crazy.

You are not crazy. Just with "prime numbers" it was not strictly necessary. But both "power of two" and trick with multiplying instead of modulo requires that step, if original hash function is such dumb.

@funny-falcon Why do we need an entries array instead of simply using a doubly linked list inside an Entry struct as we do now? Doesn't this solve most of the problems you listed.

But it introduce new problems: how reliably and efficiently combine double-linked list and moving semantic of Robin-Hood hashing? At least it will demand additional memory for list. Should it be compressed for small hashes? If so, then you will have a lot of branches to deal with. And you will ruin insertion performance.

Consider that most hashes in applications are "delete-less" ie they are only filled and (rarer) updated. Especially web applications (that selects a lot of db records, and only shows them). For such hashes, "indices hash + entries array" is most efficient data structure (if order of insertion should be preserved).

funny-falcon commented 7 years ago

@akzhan

OK, @funny-falcon - just to clarify, what do you recommend to implement following #4557 (comment)? Anyway object hash functions will be adjusted.

First, I believe it is better to have:

  def hash_inc(state : HashState)
    state = state.mix(v1)
    state = state.mix(v2)
    state
  end
  # default implementation inherited from Object
  def hash
     state = HashState.new
     state = hash_inc(state)
     state.finalize
  end

For backward compatibility, it could be reverted:

   def hash
     ...
   end
   # also inserited from Object. Is it possible to inherit both default hash and hash_inc?
   def hash_inc(state : HashState)
      state.mix(hash)
   end

(If HashState is a struct. I believe, optimizer has more freedom with such design. But it is not dogma.)

And then, Hash will initialize HashState with global or per-table random seed.

Float's hash then could be implemented as:

struct Float64
   def hash_inc(state)
      state.mix(unsafe_as(Int64))
   end
end

Second, (a bit of self promoting) I "recommend" to use my hash function: https://github.com/funny-falcon/funny_hash/blob/master/funny_hash.h It is simple, fast enough, has state of just two variables, and has fast and simple incremental mode. Probably it is not "bullet proof" as SipHash, but I believe it is just enough for hash-table hash-function.

I'd be happy to implementation all above if you like the design.

Powers of two is not a dogma. It's just faster in some scenarios. Open addressing looks like just fit to modern processors.

With sane hash functions, "power of two" becomes sane choice.

RX14 commented 7 years ago

@funny-falcon yeah it didn't hit me that Entry wouldn't be in the heap any more (if we wanted cache locality at least), you're right.

I think we should implement an ordered and unordered hash and see how fast we can make each one. Once we have data on how much of a performance difference it makes for insertion and deletion, we can decide. I'd like to see it kept ordered but if it's going to be 2x slower I don't think it's worth it.

RX14 commented 7 years ago

Also power of 2 means that the hash table can have a load factor as low as 0.5 right? I'd prefer if our hash implementation was memory efficient.

funny-falcon commented 7 years ago

@luislavena

@asterite can we assume Hash(T) to should not honor ordering? (aka: recent Ruby behavior vs legacy 1.8.x one). Working with that assumption, several changes can be recommended for Hash(T) without introducing yet another base class.

@RX14

I think we should implement an ordered and unordered hash and see how fast we can make each one. Once we have data on how much of a performance difference it makes for insertion and deletion, we can decide. I'd like to see it kept ordered but if it's going to be 2x slower I don't think it's worth it.

Back in the time, I thought it was huge mistake to introduce ordered hash for default Hash in Ruby 1.9 , cause double linked list eats a lot of memory. But then PHP adopted this new design, and then Python, and now Ruby. Now I'm not sure.

Now I believe, two base classes is "righter" way to go. Question is: should OrderedHash or UnorderedHash be default Hash?

funny-falcon commented 7 years ago

@RX14

Also power of 2 means that the hash table can have a load factor as low as 0.5 right? I'd prefer if our hash implementation was memory efficient.

In fact, prime numbers are less flexible in terms of memory/performance efficiency. Yes, you may have more prime numbers, but then you will pay for rehashing.

Power of two allows Linear Hashing. It is straight forward to implement with chaining. I implemented it once with Cuckoo Hashing. And I know implementation with Coalesce Chaining.

In any way, when we talk about OrderedHash implemented as "indices array + entries array", hash load factor affects only "indices array", and so doesn't matter that much.

And another point: memory efficiency is almost always result of tradeoff - you pay raw performance for memory efficiency. Cuckoo Hashing and Robin Hood hashing trades insertion performance for lookup performance and memory efficiency. I doubt usual application will win from this tradeoff.

I believe, there should not be "Ultimate Swiss Knife" Hash for every kind of usage. It should be convenient enough, fast enough and memory efficient enough for regular usage. It should not be "super convenient", nor "super fast", nor "super memory efficient", cause either "super" trades off from other. And don't forget about code complexity: it should be simple enough to be confident in.

Those rare men, who need particular "super" should implement it by them-self, because they knows better what they want.

funny-falcon commented 7 years ago

Ahh, I talk too much. Lets go to work.

RX14 commented 7 years ago

I didn't mean to accept both ordered and unordered implementations into the stdlib, just to benchmark both and use that to decide which to accept later.

funny-falcon commented 7 years ago

Then bechmark should a real application, not "microbenchmark".

Unordered open-addressing will beat ordered hash in almost every micro-benchmark, except, probably, iteration (and maybe insertion).

But will real application gain from this "victory"? It is a good question.

And OrderedHash-es are prooved to be useful, otherwise there weren't such hash in many standard libraries (Python had inefficient implementation for a while; before Ruby 1.9 became wildly adopted, ActiveSuport implemented ordered hash by itself).

So, certainly OrderedHash should be in stdlib. If it will perform at 90% of UnorderedHash performance, shouldn't it be default Hash? (imho, it should) 80%? 70%? (then choice is not so obvious to me :-) )

RX14 commented 7 years ago

@funny-falcon that's what I was getting at, of course unordered will beat ordered in the benchmarks the question is how much.

I'm not entirely sure how much the compiler uses hashes but it might be a decent candidate for benchmarking (if you disable the LLVM parts and just dump the bc). Otherwise a web application uses hashes for http headers (and params but they're lazily parsed so you'll have to actually access those). Ideally we'd have both micro benchmarks and real applications being benchmarked with changes.

akzhan commented 7 years ago

@sdogruyol, @eliasjp, @elorest, @drujensen and others: can You help to choose Hash implementation? There is good question to be cleared:

current Hash implementation preserves order of insertion. Should new default implementation do the same?

Please read discussion (there are many questions).

funny-falcon commented 7 years ago

Rust tries to be system language, and tries to be as efficient as possible. And then it adopted strange choses as SipHash24 and 92% load factor. I proud I convinced them to switch to SipHash13 (but I'm sure, simpler hash function will be enough). And they are ready to reduce load factor (not my proud :-) ).

What is Crystal's intention? Is it application language, or system language? Is it "convinient C/C++" or "fast Ruby"?

While Crystal certainly could be used for both, I believe that intention dictates defaults and stdlib.

(But yeah, C++ std:vunordered_hash is awful :-) )

System language doesn't need bullet-proof super-convinient Hash as default. It needs fast Hash (and hash functions).

Application language should not demand programmer to think about such issue: Hash should be convinient and safe.

akzhan commented 7 years ago

@funny-falcon

Just my opinion:

Default - Crystal is not system language. It's application/framework one (Crystal may be embeddable into Ruby applications later, as I have seen on Roadmap - "DSL for writing Ruby extensions").

Also looks like 50-75% load factor is OK. Anyway we can implement OrderedMap and BusyMap, for example, as additional classes. But it is just my opinion, Crystal compiler itself will explode with these assumptions :)

RX14 commented 7 years ago

@funny-falcon what are the tradeoffs between Rust's hash map and what you're proposing for crystal. I'm not sure what you mean by " strange choices".

funny-falcon commented 7 years ago

But it is just my opinion, Crystal compiler itself will explode with these assumptions :)

OrderedMap implemented in Ruby/PHP/Python way usually consumes less memory than current chaining approach with double-linked list for ordering. Of couse, it depends much on size of Entry (ie on sizeof Key + sizeof Value): if Entry is huge (much larger than size of 3 pointers), then chaining could be better. How often Crystal Hash is used with huge Structs as a key or value?

funny-falcon commented 7 years ago

@funny-falcon what are the tradeoffs between Rust's hash map and what you're proposing for crystal.

Rust positions itself as a system language. But then they

blindly adopted slow hash function as default for hash,
blindly decided they have to be "memory efficient", and set up load factor > 90% without measurment of its impact on performance.

Now they corrects both decision to more sane:

default is SipHash13 instead of SipHash24, and there is easy way to use simpler/faster hash functions,
looks like they will reduce load factor down to 80-85%. Robin-Hood fills much better with it, and it is still considered "memory efficient".

Tradeoff of OrderedHash compared to RHH is, on the one hand, performance for convenience - less cache locality on fetch for ordering. On the other hand, insertion performance should be comparable. And main gain: much simpler implementation (but probably I just fear to implement RHH :-) it looks to me a bit complex).

Note, that it is hard to discuss load factor tradeoff: load factor of indices hash affects memory consumption less compared to unordered openaddressing, cause indices are almost always smaller than Entry. Load factor of Entries array could be made grainer easy: size chain could be 4,6,8,12,16,32,48...2^n,2^n+2^(n-1),2^(n+1) . So, if no deletion performed, load factor is in range 0.66-1.0 . For comparison RHH with maximum load factor 0.9 has real range of 0.45-0.9 . So, what you loose cause of indices hash, you gain with load factor of Entries array.

akzhan commented 7 years ago

How often Crystal Hash is used with huge Structs as a key or value?

Feel free to Rails-like hash usage. A majority of hashes has Symbols and Strings as keys (one int32 or pointer), and String or other References as values.

There are only two known exceptions - Unicode support and compiler internals - with big values.

RX14 commented 7 years ago

I would like a memory efficiency of 80-85% on any hash table implementation crystal chooses. Of course if we go the index map approach that includes the insertion-order array too so it shouldn't be too hard even if the actual hash table has a load factor of maybe 50%.

konovod commented 7 years ago

There is only two known exceptions - Unicode support and compiler internals - with big values.

I have checked compiler internals and haven't found any big entries - Hash(String, TypeDeclarationWithLocation) has 3 pointers+1 bool, others are less. Maybe i've missed something though.

akzhan commented 7 years ago

@konovod

Thanks, I was wrong. Majority of structs have 128bit or less.

@funny-falcon

Anyway we need no backward compatibility at all. Feel free to change contracts. HashState is OK.

I "recommend" to use my hash function: https://github.com/funny-falcon/funny_hash/blob/master/funny_hash.h It is simple, fast enough, has state of just two variables, and has fast and simple incremental mode. Probably it is not "bullet proof" as SipHash, but I believe it is just enough for hash-table hash-function.

I suppose that's subject for another issues - reworking of concrete hashing algorithms.

asterite commented 7 years ago

Let's please keep one Hash, ordered, just like now. Later we can (or not) other hashes more suited for other purposes, like without order, etc. I'd like Hash to be the most convenient and mostly fast (but if it's not the fastest that's ok).

The compiler uses lots of hashes, though I believe they are generally small. We could probably use the compiler for a real-world benchmark.

asterite commented 7 years ago

By the way, Crystal is an apps programming language, not a system language. It's meant to do high-level apps. So sure, you can try to write a kernel or write shared libraries, but at least I won't give that much support because it's not the language's goal.

RX14 commented 7 years ago

@funny-falcon I think that implementing hash like this would be best and in-line with how to_s is implemented:

class object
  def hash
    hasher = SomeHash.new
    hash(hasher)
    hasher.finalize
  end

  def hash(hasher)
    hasher << object_id
  end
end

class MyClass
  property foo : Foo
  property bar : Bar

  def hash(hasher)
    hasher << @foo
    hasher << @bar
  end
end

and obviously update the macros which generate a def hash for you to do this.

btw @bew @akzhan what did you dislike about @asterite's comment?

akzhan commented 7 years ago

@RX14 I just could not see what kind of smile it was from the phone :)

elorest commented 7 years ago

I would prefer to keep it ordered unless there is a large memory or speed hit for doing so.

On Jun 14, 2017 1:46 AM, "Akzhan Abdulin" notifications@github.com wrote:

@sdogruyol https://github.com/sdogruyol, @eliasjp https://github.com/eliasjp, @elorest https://github.com/elorest and others: can You help to choose Hash implementation? There is good question to be cleared:

current Hash implementation preserves order of insertion. Should new default implementation do the same?

Please read discussion (there are many questions).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/crystal-lang/crystal/issues/4557#issuecomment-308348811, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHmpN22O6AbP1fmnuU99BioQOryqtdVks5sD4_KgaJpZM4N4Jpe .

bew commented 7 years ago

@RX14 It's a bit off-topic, it was about " Crystal is an apps programming language, not a system language ... I won't give that much support because it's not the language's goal.". I understand the comment in the context of this issue, but I dislike it in a general context, because I'd like to do system programming stuff with crystal and doing it while it's not "supported" isn't the ideal situation I think. But let's not talk about that here, it's not where I wanted my reaction to go. (if it makes sense)

konovod commented 7 years ago

In a desire to estimate performance potential i've implemented unordered hash table with open addressing, powers of two and quadratic probing, but performance was almost same as current implementation (like x1.3 faster on very big tables with 10k elements). Then i implemented RHH (with backward shift deleting), but... performance is still almost the same. In some situations it is worse than default, in some - slightly faster on big tables. Well, I've tried. Perhaps I suck at optimizations or benchmarking or just missing something. Code is here, relatively messy as it was just an experiment (and a funny one).

I've also found someone else attempt to implement better hash - https://github.com/yxhuvud/crystal-hash, it is also showing worse results than default implementation. Haven't checked it deeply though, only compiled and run benchmark.

My conclusion is that making Hash faster isn't as easy as it may seem, good luck @funny-falcon or anyone else who is going to try it.

akzhan commented 7 years ago

@konovod Firstly, thanks for links. Subject to read. But:

Powers of two need another hash algorithm (https://github.com/crystal-lang/crystal/issues/4557#issuecomment-308260625).
Crystal need another hash algorithm too in terms of security.

konovod commented 7 years ago

Hashing algorithm is additional problem. I've measured Hash(Int32, Int32) to don't deal with hashing quirks. Changing hash algorithm will slow down things more, even if it would be fast.

funny-falcon commented 7 years ago

I've measured Hash(Int32, Int32) to don't deal with hashing quirks. Changing hash algorithm will slow down things more, even if it would be fast.

That is unavoidable evel :-( I will try to special-case hashing Int32 key, but still it will be slower than no hashing.

funny-falcon commented 7 years ago

I've also found someone else attempt to implement better hash - https://github.com/yxhuvud/crystal-hash,

Thanks for link. It will short my study of Crystal's idioms.

funny-falcon commented 7 years ago

Code is here,

And for this.

yxhuvud commented 7 years ago

"I've also found someone else attempt to implement better hash - https://github.com/yxhuvud/crystal-hash, it is also showing worse results than default implementation. "

@konovod well, it seems to be worse for small hashes. It is multiple times faster for big hashes. Also it should be noticed that I haven't really done proper measurement of how fast or slow it is, so any claims of speed or slowness may be wrong. I also haven't done any benchmark driven optimizations at all, so it may very well contain stupid mistakes.

@funny-falcon Note that it (ie Hash2b) is almost a line for line translation of the ruby implementation (except for the index compaction), with some ideomatic changes to allow for usage of structs and objects (hopefully where appropriate). But as I havn't benchmarked, some of that may be quite misguided.

Another note is that it changes the hash contract slightly compared to the current crystal implementation. It does an additional check that the hashes are the same before comparing equality when doing lookup. That is probably a good thing, but it should be noted.

konovod commented 7 years ago

@yxhuvud I've tried to benchmark you table with my (pretty arbitrary) microbenchmark (consisting of adding N random values, searching N random values, deleting N/2 random values, adding N/3 values, looking up 3*N values, all values are rnd.rand(N) with fixed seed), and found that performance is same as default hash (slightly worse on hashes with N=100, same with N=10 and N=10000). Maybe that means that my benchmark is broken and measures something else, as it's seems to have same results for any hash implementation, lol. I have an idea of better test, will try it soon.

funny-falcon commented 7 years ago

@yxhuvud

is almost a line for line translation ... with some ideomatic changes

almost, but not exact :-( . When the performance is taken into account, there is no room for idiomatic changes :-( . Entries should be mutable structs, and filled by key and value separately.

Any way, it could happen that I can not do better :-) Time will tell.

funny-falcon commented 7 years ago

Is there a way to make conditional compilation based on generic argument?

class HashTbl(K, V)
  # pseudo code, not compiles now
  if K == UInt32
   struct Entry(K, V)
    property key : K
    property val : V
   end
  else
   struct Entry(K, V)
    property key : K
    property hash : UInt32
    property val : V
   end
  end
end

funny-falcon commented 7 years ago

Ok, got it:

class HashTbl(K, V)
  KEY = K # need to reassign to access in macro
  {% if KEY == UInt32 %}
   struct Entry(K, V)
    property key : K
    property val : V
   end
  {% else %}
   struct Entry(K, V)
    property key : K
    property hash : UInt32
    property val : V
   end
  {% end %}
end

Could it be noted in documentation? Or why generic argument is not accessible as constant in a macro?

funny-falcon commented 7 years ago

ah, no, it doesn't work: KEY is assigned at first invocation, and not per Generic instance :-( It quite limits Generics usability.

funny-falcon commented 7 years ago

More: struct Entry is also could not be overriden :-(

crystal-lang / crystal

Reimplementation a Hash using open addressing #4557