canonical / chisel

GNU Affero General Public License v3.0
270 stars 42 forks source link

lib/sortedset: Add string sorted set #83

Closed woky closed 1 year ago

woky commented 1 year ago

Often we have a set of strings consisting of just one or only a few elements. In the paths to slices relationship there's usually just one slice owning a path. Similarly, in package source paths to extract target paths relationship, there's usually just one target path for each source path. But this is not always. E.g. /usr/ directory may be owned by several slices. And /usr/bin/foo may be copied to /usr/bin/foo and /bin/foo. It'd be wasteful to record such small sets as map[string]bool.

This commits adds sortedset package which currently contains only the string sorted set. It's used in forthcoming commits to record paths to slices relationship in the final Chisel DB implementation.

The package sortedset was placed under internal/lib/ directory which seemed like a good place to shove such code which would not deserve a box in a chisel architecture diagram.

woky commented 1 year ago

I'm not convinced that this is an improved over a map+slice for cases we care about, which is already a pattern well established in the code in several places. The "wasteful" bit is a trade-off: a hash map vs. a sorted list, and that's even ignoring the fact this logic is specific to strings at the moment.

With that said, if this is not just a "it seems wasteful" but rather you have an actual case that was investigated, happy to talk about it.

The rationale for this type is:

  1. 99% of all sets are going to be single element sets (100% with current chisel-release as we don't copy a package path to more than one place)
  2. the underlying slice data type can be directly assigned into the model type used for jsonwall serialization (added by https://github.com/canonical/chisel/pull/71, Path.Slices field)

I don't get what you meant my map+slice because here the set type would be just map[string]bool (or map[string]any, ...). If I were to use map[string]bool then I'd have to convert it into a slice anyway and the map won't give me any benefit. This type is only used in one file added by https://github.com/canonical/chisel/pull/78. @niemeyer Would you agree if I move it into it as private type?

woky commented 1 year ago

This type is only used in one file added by https://github.com/canonical/chisel/pull/78. @niemeyer Would you agree if I move it into it as private type?

I went ahead and moved that into #78 (as slicer.stringSet). This type is just an implementation detail of path tracking, so it doesn't need a separate commit. I've retained the test though.