amuletml / amulet

An ML-like functional programming language
https://amulet.works/
BSD 3-Clause "New" or "Revised" License
324 stars 14 forks source link

A terrible hash map implementation #273

Closed SquidDev closed 4 years ago

SquidDev commented 4 years ago
plt-amy commented 4 years ago

I wonder if a HAMT implementation wouldn't be better?

SquidDev commented 4 years ago

But I guess it doesn't matter if everything's out of bounds for it..

Yeah, I'm fairly sure this is a safe optimisation as empty array can't be modified or indexed. Java does it, though I realise that's hardly a great commendation.

I wonder if a HAMT implementation wouldn't be better?

I don't know if it's worth having both? I definitely want some kind of mutable map structure, though am not exactly wedded to this particular design - might try out something closer to Lua's hash implementation.

davidgarland commented 4 years ago

If performance is your aim, I might recommend looking into robin hood hashing with backwards shift deletion. This also implies switching to open addressing and linear probing, which are better for cache. (Though, I think you ought to make the key/value pair into two individual arrays for the usual SoA benefits-- <$> on hashmaps would then no longer need to load the keys into cache as well, for instance, and the same goes for iterating over keys only.)

SquidDev commented 4 years ago

Thanks for the links David! I was vaguely aware of robin hood hashing, but those are articles were very useful.

I must confess this implementation was largely written from whatever I remember of algorithms courses, so is definitely not as sane as it should be. I definitely want to replace this with a better implementation.

That said, performance is a bit tricker - the backend spits out absolutely terrible Lua code, which means any memory optimisations are overshadowed by the number of closures/tables we're allocating for every call. I really need to fix this before doing any serious profiling:

-- map.[1] compiles to:
local tmp0 = _dot_28_290({
  hash = function(x) return x end,
  ["Hashable$agcy"] = {
    ["=="] = int_eq0,
    ["<>"] = function(bqq)
      return function(bqr)
        return bqq ~= bqr
      end
    end
  }
})(map)(1)

Looks like it's time to do some optimiser work again :D:.