Open neilconway opened 12 years ago
Custom hash and equality functions on the tuple don't work because the tuple doesn't know its keycols. It cannot know its keycols because we pass a tuple around by ref.
Custom hash and equality functions in the collection don't work well; I experimented with a set data structure written in Ruby with schema-specific hash and equals support, but it doesn't match the built-in Ruby hash for performance.
The fastest method I have seen so far (staying in pure Ruby) is to wrap the tuple with an object with schema-specific hash and equals, but which doesn't extract the fields. This is faster than our current method of creating an array: on jruby it is twice is fast, but marginal (10%) on ruby 1.8.7 and 1.9.3
Given, table :foo, [:k1, :k2] => [:v]
generate a key class as follows
class Foo_Key
def initialize(tup)
@tup = tup
end
def hash
@hash = @tup.k1.hash * 31 + @tup.k2.hash // good to cache the hashcode.
end
def ==(other)
other.class <= Foo_Key and @tup.k1 = other.tup.k1 and @tup.k2 == other.tup.k2
end
end
By "custom hash and equality functions", I was assuming we'd be able to control the hash/equality function APIs, so it would be easy to arrange to pass a reference to the hash table/collection when invoking the function. That's what I did in C4.
Right -- to beat the performance of the builtin Hash, we'd probably want a C extension.
We currently use a Ruby Hash to map keys to tuples in Bud collections. This works fine, but there is room to optimize Bud collections by using a custom hash-like type:
In the grand scheme of things, this isn't a priority but might be fun.