andyferris / Dictionaries.jl

An alternative interface for dictionaries in Julia, for improved productivity and performance
Other
282 stars 28 forks source link

Add UnorderedIndices and UnorderedDictionary #106

Closed andyferris closed 2 years ago

andyferris commented 2 years ago

These can be a bit faster than Dict for insertion/deletion heavy workloads, particularly when iteration speed/order is unimportant. E.g. caches. Based on the Base.Dict implementation with cruft relating to WeakKeyDict removed.

codecov[bot] commented 2 years ago

Codecov Report

Merging #106 (93078c9) into master (436bc11) will increase coverage by 1.61%. The diff coverage is 87.22%.

@@            Coverage Diff             @@
##           master     #106      +/-   ##
==========================================
+ Coverage   77.13%   78.75%   +1.61%     
==========================================
  Files          18       20       +2     
  Lines        2003     2320     +317     
==========================================
+ Hits         1545     1827     +282     
- Misses        458      493      +35     
Impacted Files Coverage Δ
src/Indices.jl 88.05% <66.66%> (-0.23%) :arrow_down:
src/UnorderedDictionary.jl 83.52% <83.52%> (ø)
src/UnorderedIndices.jl 88.68% <88.68%> (ø)
src/AbstractIndices.jl 79.08% <90.00%> (+0.24%) :arrow_up:
src/ArrayDictionary.jl 85.29% <100.00%> (+0.44%) :arrow_up:
src/reverse.jl 55.55% <0.00%> (+2.77%) :arrow_up:
src/insertion.jl 73.96% <0.00%> (+2.95%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

andyferris commented 2 years ago

Hmm... some of the benchmark results seemed good but for others we see some performance issues (e.g. filter (few), count, filter-map-reduce, etc). I'll need to investigate what's going on before merging.

For size 10k:

filter (few) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(6.580 μs)
  "UnorderedIndices" => Trial(70.250 μs)
  "OrderedSet" => Trial(4.464 μs)
  "Set" => Trial(15.640 μs)

intersect (empty) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(53.100 μs)
  "UnorderedIndices" => Trial(28.220 μs)
  "OrderedSet" => Trial(15.650 μs)
  "Set" => Trial(30.320 μs)

setdiff (empty) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(137.280 μs)
  "UnorderedIndices" => Trial(123.350 μs)
  "OrderedSet" => Trial(332.490 μs)
  "Set" => Trial(113.709 μs)

copy (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(16.880 μs)
  "UnorderedIndices" => Trial(14.990 μs)
  "OrderedSet" => Trial(222.020 μs)
  "Set" => Trial(8.384 μs)

unique/distinct (high uniqueness, sorted) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(145.790 μs)
  "Vector (unique)" => Trial(160.699 μs)
  "OrderedSet" => Trial(175.620 μs)
  "Set" => Trial(95.610 μs)

unique/distinct (low uniqueness, sorted) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(7.032 μs)
  "Vector (unique)" => Trial(31.380 μs)
  "OrderedSet" => Trial(41.730 μs)
  "Set" => Trial(44.160 μs)

filter (half) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(113.660 μs)
  "UnorderedIndices" => Trial(148.210 μs)
  "OrderedSet" => Trial(107.360 μs)
  "Set" => Trial(116.950 μs)

copy and filter! (most) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(41.930 μs)
  "UnorderedIndices" => Trial(75.190 μs)
  "OrderedSet" => Trial(225.049 μs)
  "Set" => Trial(22.510 μs)

copy and empty by deletion (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(115.810 μs)
  "UnorderedIndices" => Trial(52.030 μs)
  "OrderedSet" => Trial(317.030 μs)
  "Set" => Trial(47.440 μs)

copy and filter! (half) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(31.780 μs)
  "UnorderedIndices" => Trial(68.120 μs)
  "OrderedSet" => Trial(274.690 μs)
  "Set" => Trial(45.970 μs)

copy and filter! (few) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(20.790 μs)
  "UnorderedIndices" => Trial(71.090 μs)
  "OrderedSet" => Trial(335.039 μs)
  "Set" => Trial(118.840 μs)

intersect (whole) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(121.050 μs)
  "UnorderedIndices" => Trial(118.210 μs)
  "OrderedSet" => Trial(255.700 μs)
  "Set" => Trial(170.530 μs)

filter (most) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(231.479 μs)
  "UnorderedIndices" => Trial(172.200 μs)
  "OrderedSet" => Trial(218.740 μs)
  "Set" => Trial(143.920 μs)

setdiff (whole) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(55.260 μs)
  "UnorderedIndices" => Trial(30.360 μs)
  "OrderedSet" => Trial(130.990 μs)
  "Set" => Trial(43.360 μs)

symdiff (right half) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(167.280 μs)
  "UnorderedIndices" => Trial(166.539 μs)
  "OrderedSet" => Trial(295.940 μs)
  "Set" => Trial(268.800 μs)

symdiff (left half) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(83.690 μs)
  "UnorderedIndices" => Trial(41.920 μs)
  "OrderedSet" => Trial(313.889 μs)
  "Set" => Trial(255.240 μs)

intersect (half) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(41.240 μs)
  "UnorderedIndices" => Trial(27.240 μs)
  "OrderedSet" => Trial(153.280 μs)
  "Set" => Trial(200.220 μs)

union (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(116.400 μs)
  "UnorderedIndices" => Trial(67.670 μs)
  "OrderedSet" => Trial(220.510 μs)
  "Set" => Trial(167.510 μs)

unique/distinct (high uniqueness, unsorted) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices (distinct)" => Trial(179.100 μs)
  "Vector (unique)" => Trial(167.270 μs)
  "OrderedSet" => Trial(183.550 μs)
  "Set" => Trial(84.140 μs)

not in (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(92.230 μs)
  "UnorderedIndices" => Trial(26.799 μs)
  "OrderedSet" => Trial(108.740 μs)
  "Set" => Trial(95.810 μs)

build by insertion (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(209.740 μs)
  "UnorderedIndices" => Trial(100.910 μs)
  "OrderedSet" => Trial(198.530 μs)
  "Set" => Trial(111.860 μs)

unique/distinct (low uniqueness, unsorted) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(53.290 μs)
  "Vector (unique)" => Trial(41.310 μs)
  "OrderedSet" => Trial(55.020 μs)
  "Set" => Trial(41.680 μs)

sum (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(2.164 μs)
  "UnorderedIndices" => Trial(34.580 μs)
  "OrderedSet" => Trial(4.310 μs)
  "Set" => Trial(12.680 μs)

insertion/deletion tests (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(1.522 ms)
  "UnorderedIndices" => Trial(1.462 ms)
  "OrderedSet" => Trial(2.054 ms)
  "Set" => Trial(1.356 ms)

count (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(4.310 μs)
  "UnorderedIndices" => Trial(56.350 μs)
  "OrderedSet" => Trial(4.123 μs)
  "Set" => Trial(13.150 μs)

setdiff (half) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(107.390 μs)
  "UnorderedIndices" => Trial(146.060 μs)
  "OrderedSet" => Trial(177.150 μs)
  "Set" => Trial(165.630 μs)

symdiff (whole) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(118.290 μs)
  "UnorderedIndices" => Trial(58.800 μs)
  "OrderedSet" => Trial(247.310 μs)
  "Set" => Trial(176.010 μs)

symdiff (empty) (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(141.280 μs)
  "UnorderedIndices" => Trial(121.040 μs)
  "OrderedSet" => Trial(381.310 μs)
  "Set" => Trial(324.680 μs)

in (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(23.080 μs)
  "UnorderedIndices" => Trial(25.220 μs)
  "OrderedSet" => Trial(63.930 μs)
  "Set" => Trial(29.820 μs)

foreach (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(4.504 μs)
  "UnorderedIndices" => Trial(59.870 μs)
  "OrderedSet" => Trial(2.159 μs)
  "Set" => Trial(12.120 μs)

constructor (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(33.720 μs)
  "UnorderedIndices" => Trial(75.020 μs)
  "OrderedSet" => Trial(203.020 μs)
  "Set" => Trial(94.620 μs)

filter-map-reduce via generator (10000) => 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(6.468 μs)
  "UnorderedIndices" => Trial(54.610 μs)
  "OrderedSet" => Trial(5.395 μs)
  "Set" => Trial(25.560 μs)
andyferris commented 2 years ago

OK I identified that a lot of the slow benchmarks is because iterations is slow. Partly this seems to be the extra indirection between iterate and iteratetokens is hard for the compiler to optimize. I have reduced the overhead somewhat, found a missing inbounds, etc, but there is still an obvious gap in iterating values as you can see in e.g. the sum benchmark below.

I'm not sure what else I can do to improve things... possibly implement iterate directly and see if that helps?

Pair{Any, Any}("filter (few) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(6.579 μs)
  "UnorderedIndices" => Trial(16.820 μs)
  "OrderedSet" => Trial(4.469 μs)
  "Set" => Trial(14.940 μs))

Pair{Any, Any}("intersect (empty) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(49.680 μs)
  "UnorderedIndices" => Trial(25.110 μs)
  "OrderedSet" => Trial(15.960 μs)
  "Set" => Trial(30.250 μs))

Pair{Any, Any}("setdiff (empty) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(126.710 μs)
  "UnorderedIndices" => Trial(112.900 μs)
  "OrderedSet" => Trial(327.860 μs)
  "Set" => Trial(114.440 μs))

Pair{Any, Any}("copy (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(8.250 μs)
  "UnorderedIndices" => Trial(8.640 μs)
  "OrderedSet" => Trial(216.880 μs)
  "Set" => Trial(4.540 μs))

Pair{Any, Any}("unique/distinct (high uniqueness, sorted) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(141.070 μs)
  "Vector (unique)" => Trial(155.860 μs)
  "OrderedSet" => Trial(175.280 μs)
  "Set" => Trial(84.540 μs))

Pair{Any, Any}("unique/distinct (low uniqueness, sorted) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(8.850 μs)
  "Vector (unique)" => Trial(31.550 μs)
  "OrderedSet" => Trial(41.320 μs)
  "Set" => Trial(40.470 μs))

Pair{Any, Any}("filter (half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(102.430 μs)
  "UnorderedIndices" => Trial(138.660 μs)
  "OrderedSet" => Trial(103.630 μs)
  "Set" => Trial(115.830 μs))

Pair{Any, Any}("copy and filter! (most) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(25.820 μs)
  "UnorderedIndices" => Trial(25.380 μs)
  "OrderedSet" => Trial(220.150 μs)
  "Set" => Trial(19.670 μs))

Pair{Any, Any}("copy and empty by deletion (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(111.770 μs)
  "UnorderedIndices" => Trial(44.630 μs)
  "OrderedSet" => Trial(316.890 μs)
  "Set" => Trial(44.870 μs))

Pair{Any, Any}("copy and filter! (half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(20.380 μs)
  "UnorderedIndices" => Trial(28.460 μs)
  "OrderedSet" => Trial(270.760 μs)
  "Set" => Trial(43.100 μs))

Pair{Any, Any}("copy and filter! (few) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(12.080 μs)
  "UnorderedIndices" => Trial(56.850 μs)
  "OrderedSet" => Trial(332.640 μs)
  "Set" => Trial(116.530 μs))

Pair{Any, Any}("intersect (whole) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(109.010 μs)
  "UnorderedIndices" => Trial(112.109 μs)
  "OrderedSet" => Trial(253.880 μs)
  "Set" => Trial(169.040 μs))

Pair{Any, Any}("filter (most) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(219.689 μs)
  "UnorderedIndices" => Trial(164.740 μs)
  "OrderedSet" => Trial(216.220 μs)
  "Set" => Trial(146.280 μs))

Pair{Any, Any}("setdiff (whole) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(53.830 μs)
  "UnorderedIndices" => Trial(25.140 μs)
  "OrderedSet" => Trial(127.980 μs)
  "Set" => Trial(39.800 μs))

Pair{Any, Any}("symdiff (right half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(160.240 μs)
  "UnorderedIndices" => Trial(163.010 μs)
  "OrderedSet" => Trial(296.140 μs)
  "Set" => Trial(272.100 μs))

Pair{Any, Any}("symdiff (left half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(68.510 μs)
  "UnorderedIndices" => Trial(33.920 μs)
  "OrderedSet" => Trial(309.000 μs)
  "Set" => Trial(256.510 μs))

Pair{Any, Any}("intersect (half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(35.340 μs)
  "UnorderedIndices" => Trial(28.200 μs)
  "OrderedSet" => Trial(150.400 μs)
  "Set" => Trial(194.509 μs))

Pair{Any, Any}("union (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(104.490 μs)
  "UnorderedIndices" => Trial(61.790 μs)
  "OrderedSet" => Trial(216.589 μs)
  "Set" => Trial(165.860 μs))

Pair{Any, Any}("unique/distinct (high uniqueness, unsorted) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices (distinct)" => Trial(182.530 μs)
  "Vector (unique)" => Trial(165.410 μs)
  "OrderedSet" => Trial(181.170 μs)
  "Set" => Trial(82.500 μs))

Pair{Any, Any}("not in (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(89.770 μs)
  "UnorderedIndices" => Trial(22.260 μs)
  "OrderedSet" => Trial(105.230 μs)
  "Set" => Trial(93.350 μs))

Pair{Any, Any}("build by insertion (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(196.320 μs)
  "UnorderedIndices" => Trial(100.510 μs)
  "OrderedSet" => Trial(194.810 μs)
  "Set" => Trial(110.330 μs))

Pair{Any, Any}("unique/distinct (low uniqueness, unsorted) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(52.100 μs)
  "Vector (unique)" => Trial(41.550 μs)
  "OrderedSet" => Trial(53.100 μs)
  "Set" => Trial(40.640 μs))

Pair{Any, Any}("sum (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(2.179 μs)
  "UnorderedIndices" => Trial(38.850 μs)
  "OrderedSet" => Trial(2.179 μs)
  "Set" => Trial(11.080 μs))

Pair{Any, Any}("insertion/deletion tests (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(1.523 ms)
  "UnorderedIndices" => Trial(1.488 ms)
  "OrderedSet" => Trial(2.056 ms)
  "Set" => Trial(1.357 ms))

Pair{Any, Any}("count (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(4.319 μs)
  "UnorderedIndices" => Trial(17.530 μs)
  "OrderedSet" => Trial(4.130 μs)
  "Set" => Trial(13.080 μs))

Pair{Any, Any}("setdiff (half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(100.810 μs)
  "UnorderedIndices" => Trial(149.830 μs)
  "OrderedSet" => Trial(174.180 μs)
  "Set" => Trial(168.460 μs))

Pair{Any, Any}("symdiff (whole) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(107.500 μs)
  "UnorderedIndices" => Trial(57.250 μs)
  "OrderedSet" => Trial(244.540 μs)
  "Set" => Trial(176.350 μs))

Pair{Any, Any}("symdiff (empty) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(135.250 μs)
  "UnorderedIndices" => Trial(114.209 μs)
  "OrderedSet" => Trial(379.320 μs)
  "Set" => Trial(327.730 μs))

Pair{Any, Any}("in (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(23.100 μs)
  "UnorderedIndices" => Trial(29.659 μs)
  "OrderedSet" => Trial(46.570 μs)
  "Set" => Trial(27.900 μs))

Pair{Any, Any}("foreach (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(4.319 μs)
  "UnorderedIndices" => Trial(33.560 μs)
  "OrderedSet" => Trial(2.169 μs)
  "Set" => Trial(11.190 μs))

Pair{Any, Any}("constructor (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(21.870 μs)
  "UnorderedIndices" => Trial(57.880 μs)
  "OrderedSet" => Trial(198.710 μs)
  "Set" => Trial(82.070 μs))

Pair{Any, Any}("filter-map-reduce via generator (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(4.330 μs)
  "UnorderedIndices" => Trial(37.490 μs)
  "OrderedSet" => Trial(5.430 μs)
  "Set" => Trial(24.500 μs))
andyferris commented 2 years ago

OK, that was a rabbit hole that ended anticlimactically. The hash table for UnorderedIndices was sometimes too large, making the benchmarks worse. New benchmarks:

Pair{Any, Any}("filter (few) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(6.580 μs)
  "UnorderedIndices" => Trial(11.740 μs)
  "OrderedSet" => Trial(4.469 μs)
  "Set" => Trial(15.650 μs))

Pair{Any, Any}("intersect (empty) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(42.340 μs)
  "UnorderedIndices" => Trial(25.740 μs)
  "OrderedSet" => Trial(16.340 μs)
  "Set" => Trial(31.050 μs))

Pair{Any, Any}("setdiff (empty) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(148.090 μs)
  "UnorderedIndices" => Trial(93.990 μs)
  "OrderedSet" => Trial(327.560 μs)
  "Set" => Trial(114.150 μs))

Pair{Any, Any}("copy (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(12.350 μs)
  "UnorderedIndices" => Trial(6.010 μs)
  "OrderedSet" => Trial(214.030 μs)
  "Set" => Trial(5.670 μs))

Pair{Any, Any}("unique/distinct (high uniqueness, sorted) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(141.500 μs)
  "Vector (unique)" => Trial(156.460 μs)
  "OrderedSet" => Trial(174.710 μs)
  "Set" => Trial(83.660 μs))

Pair{Any, Any}("unique/distinct (low uniqueness, sorted) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(8.860 μs)
  "Vector (unique)" => Trial(31.550 μs)
  "OrderedSet" => Trial(41.040 μs)
  "Set" => Trial(40.600 μs))

Pair{Any, Any}("filter (half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(103.490 μs)
  "UnorderedIndices" => Trial(88.360 μs)
  "OrderedSet" => Trial(102.239 μs)
  "Set" => Trial(116.190 μs))

Pair{Any, Any}("copy and filter! (most) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(29.810 μs)
  "UnorderedIndices" => Trial(13.720 μs)
  "OrderedSet" => Trial(218.620 μs)
  "Set" => Trial(20.780 μs))

Pair{Any, Any}("copy and empty by deletion (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(132.370 μs)
  "UnorderedIndices" => Trial(58.440 μs)
  "OrderedSet" => Trial(308.490 μs)
  "Set" => Trial(50.170 μs))

Pair{Any, Any}("copy and filter! (half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(23.140 μs)
  "UnorderedIndices" => Trial(19.120 μs)
  "OrderedSet" => Trial(265.089 μs)
  "Set" => Trial(41.720 μs))

Pair{Any, Any}("copy and filter! (few) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(15.080 μs)
  "UnorderedIndices" => Trial(18.249 μs)
  "OrderedSet" => Trial(326.650 μs)
  "Set" => Trial(117.460 μs))

Pair{Any, Any}("intersect (whole) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(72.070 μs)
  "UnorderedIndices" => Trial(90.230 μs)
  "OrderedSet" => Trial(252.840 μs)
  "Set" => Trial(172.940 μs))

Pair{Any, Any}("filter (most) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(221.480 μs)
  "UnorderedIndices" => Trial(127.660 μs)
  "OrderedSet" => Trial(211.660 μs)
  "Set" => Trial(146.710 μs))

Pair{Any, Any}("setdiff (whole) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(55.990 μs)
  "UnorderedIndices" => Trial(25.840 μs)
  "OrderedSet" => Trial(130.540 μs)
  "Set" => Trial(39.700 μs))

Pair{Any, Any}("symdiff (right half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(162.810 μs)
  "UnorderedIndices" => Trial(217.970 μs)
  "OrderedSet" => Trial(293.040 μs)
  "Set" => Trial(269.780 μs))

Pair{Any, Any}("symdiff (left half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(73.910 μs)
  "UnorderedIndices" => Trial(30.790 μs)
  "OrderedSet" => Trial(308.750 μs)
  "Set" => Trial(253.750 μs))

Pair{Any, Any}("intersect (half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(36.030 μs)
  "UnorderedIndices" => Trial(21.040 μs)
  "OrderedSet" => Trial(151.370 μs)
  "Set" => Trial(201.440 μs))

Pair{Any, Any}("union (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(106.320 μs)
  "UnorderedIndices" => Trial(48.430 μs)
  "OrderedSet" => Trial(212.570 μs)
  "Set" => Trial(166.340 μs))

Pair{Any, Any}("unique/distinct (high uniqueness, unsorted) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices (distinct)" => Trial(179.780 μs)
  "Vector (unique)" => Trial(164.750 μs)
  "OrderedSet" => Trial(179.790 μs)
  "Set" => Trial(82.870 μs))

Pair{Any, Any}("not in (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(90.450 μs)
  "UnorderedIndices" => Trial(78.080 μs)
  "OrderedSet" => Trial(103.940 μs)
  "Set" => Trial(93.940 μs))

Pair{Any, Any}("build by insertion (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(195.320 μs)
  "UnorderedIndices" => Trial(98.720 μs)
  "OrderedSet" => Trial(192.430 μs)
  "Set" => Trial(112.099 μs))

Pair{Any, Any}("unique/distinct (low uniqueness, unsorted) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(53.990 μs)
  "Vector (unique)" => Trial(40.460 μs)
  "OrderedSet" => Trial(52.240 μs)
  "Set" => Trial(40.580 μs))

Pair{Any, Any}("sum (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(2.179 μs)
  "UnorderedIndices" => Trial(9.760 μs)
  "OrderedSet" => Trial(2.179 μs)
  "Set" => Trial(12.250 μs))

Pair{Any, Any}("insertion/deletion tests (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(1.526 ms)
  "UnorderedIndices" => Trial(1.479 ms)
  "OrderedSet" => Trial(2.035 ms)
  "Set" => Trial(1.360 ms))

Pair{Any, Any}("count (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(4.319 μs)
  "UnorderedIndices" => Trial(11.290 μs)
  "OrderedSet" => Trial(4.060 μs)
  "Set" => Trial(14.930 μs))

Pair{Any, Any}("setdiff (half) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(106.980 μs)
  "UnorderedIndices" => Trial(148.099 μs)
  "OrderedSet" => Trial(171.980 μs)
  "Set" => Trial(164.540 μs))

Pair{Any, Any}("symdiff (whole) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(107.500 μs)
  "UnorderedIndices" => Trial(46.450 μs)
  "OrderedSet" => Trial(242.180 μs)
  "Set" => Trial(176.100 μs))

Pair{Any, Any}("symdiff (empty) (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(156.500 μs)
  "UnorderedIndices" => Trial(102.930 μs)
  "OrderedSet" => Trial(378.090 μs)
  "Set" => Trial(328.079 μs))

Pair{Any, Any}("in (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(21.550 μs)
  "UnorderedIndices" => Trial(29.640 μs)
  "OrderedSet" => Trial(43.380 μs)
  "Set" => Trial(29.840 μs))

Pair{Any, Any}("foreach (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(4.319 μs)
  "UnorderedIndices" => Trial(11.320 μs)
  "OrderedSet" => Trial(2.169 μs)
  "Set" => Trial(12.820 μs))

Pair{Any, Any}("constructor (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(28.750 μs)
  "UnorderedIndices" => Trial(61.610 μs)
  "OrderedSet" => Trial(194.279 μs)
  "Set" => Trial(82.820 μs))

Pair{Any, Any}("filter-map-reduce via generator (10000)", 4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "Indices" => Trial(4.339 μs)
  "UnorderedIndices" => Trial(23.780 μs)
  "OrderedSet" => Trial(5.060 μs)
  "Set" => Trial(24.530 μs))

It seems better all around now, so we'll merge this.

Note: there are further optimizations to the hash table implementation in julia-1.9 (dev) which we could probably port at some stage.

PallHaraldsson commented 1 year ago

@andyferris I made a comment here to the README, but can't see it here (because I commeted since PR merged?), so pointing to here just in case, you didn't get notified: https://github.com/andyferris/Dictionaries.jl/commit/dc48ba1b4af555274fa9c2c28f902f5f1452abb6