cedar-policy / cedar-go

Golang implementation of the Cedar Policy Language
Apache License 2.0
85 stars 9 forks source link

Improve `Set.Contains()` performance by making `Set` a hash set-like data structure #38

Closed patjakdev closed 2 months ago

patjakdev commented 2 months ago

Issue #, if available: None

Description of changes:

This PR changes types.Set into a cheeseball open-addressable hash set by changing the underlying storage from a slice to a map[uint64]types.Value where the uint64 key is a hash of the corresponding types.Value.

This required us to add a hash() function to the Value() interface. Non-trivial implementations of the hash() function use the fnv hash from the Go standard library, which is apparently the same one used (at least at one time) for Go's internal implementation of map.

In order to make this safe, we also had to make Record and Set immutable, which was generally a good quality of life improvement in terms of cognitive overhead when thinking about how to deal with Values and allowed us to remove the deepClone() method of the Value interface. It does, however, mean that in order to construct a Set or Record, one now needs to go through the NewSet() or NewRecord() constructor function, respectively.

Collisions are resolved by simply adding 1 to the computed hash of the Value until a free slot is found in the map. This works trivially, so long as the Set remains immutable because to no values can be removed from the map and thus no holes in the open addressing chain can be created.

patjakdev commented 2 months ago

I'm considering using the maphash package for the non-trivial hashing instead of fnv. It seems like it's fit for purpose.

The "drawback" is that it forces you to have a random seed, which makes testing of the rendering of Sets into Cedar text or JSON a bit more difficult as the order is unpredictable.

apg commented 2 months ago

We could. Seems cheaper to sum hash() rather than build a string and hash that though. Summing hash() seems generic enough to work for all non-atomic types.On Sep 18, 2024, at 17:02, Patrick Jakubowski @.***> wrote: @patjakdev commented on this pull request.

In types/set.go:

-type Set []Value +// A Set is an immutable collection of elements that can be of the same or different types. +type Set struct {

  • s map[uint64]Value
  • hashVal uint64 +}
  • +// NewSet takes a slice of Values and stores a clone of the values internally. +func NewSet(v []Value) Set {

  • orig := make([]Value, len(v))
  • set := make(map[uint64]Value, len(v))
  • for i, vv := range v {
  • orig[i] = vv
  • hash := vv.hash()

Another thought. Could we just call vv.String() here and hash the string output of the Value? We'd have to check if the String() output is always the same for any equivalent instantiation of Value... Just a thought before I step away for the day...

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>