advancedSTORE / lib_tcstring

IAB TCString Library in Rust
Apache License 2.0
4 stars 0 forks source link

Consider guaranteeing vendor list sort order #162

Open clbarnes opened 3 months ago

clbarnes commented 3 months ago

The vendor consent vecs will almost always be used for membership checks. If encoding is ever added to this crate, guaranteeing ascending order and non-repeats through modification makes that much easier. You'd hope that people would encode their TC strings as sorted (always true for a bitfield, but possibly not true for ranges), but they might not, so it would be good to guarantee it.

Using a BTreeSet instead of a Vec guarantees ascending order and uniqueness, even if people modify it for re-encoding later. Otherwise, a sorted vec is an improvement as it allows binary searches. An &mut self method to sort all the internal vendor lists would be minimally invasive.

friedemannsommer commented 3 months ago

Hello @clbarnes,

There are currently no plans to implement encoding for this crate. We’re open for contributions if you (or anyone else reading this) want(s) to implement TCString encoding for this crate.

I agree with you that bitfield lists should be sorted, but I (personally) would prefer a solution which is optional either via an option or crate feature. With the current implementation it is not possible to pass options, be it a callback function that can be used to sort the bitfield lists or an option to signal the implementation, it should sort these lists. So it would require a rewrite of the user-facing API as well.

clbarnes commented 3 months ago

The sorting was a bit of an aside - switching out the vec of vendor IDs for a btreeset would coincidentally guarantee order while primarily making containment checks more efficient (order being a useful tiebreaker vs a hashset). Containment checks are the entire purpose of vendor list so it seems like a use case worth prioritising. And serde serialises them exactly the same way so no changes needed there!

I only mentioned sorting as well because in a read-only workflow as currently supported, a sorted vec is probably about as good as a btreeset because you can binary search it.

friedemannsommer commented 3 months ago

The overhead for using the BTreeSet is probably also beneficial for the allowed_vendors and disclosed_vendors not just the vendors_consent and vendors_li_consent.

I will have to do some testing whether this overhead is worth placing behind an option / feature or if it should be the new default behavior.

friedemannsommer commented 2 months ago

The overhead for using BTreeSet is roughly 39%.

baseline (without BTreeSet) ``` TCString V2 (core only) time: [346.08 ns 346.62 ns 347.23 ns] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild TCString V2 (core + disclosed vendors) time: [1.5183 µs 1.5202 µs 1.5225 µs] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe TCString V2 (core + disclosed vendors + allowed vendors) time: [2.9701 µs 3.0044 µs 3.0392 µs] Found 24 outliers among 100 measurements (24.00%) 21 (21.00%) low severe 2 (2.00%) low mild 1 (1.00%) high severe TCString V2 (core + publisher tc) time: [574.17 ns 574.92 ns 575.73 ns] Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe TCString V2 (core + disclosed vendors + allowed vendors + publisher tc) time: [779.89 ns 780.60 ns 781.31 ns] Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild ```
with BTreeSet ``` TCString V2 (core only) time: [361.11 ns 361.55 ns 362.04 ns] change: [+4.4258% +4.7797% +5.1308%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 7 (7.00%) high mild 2 (2.00%) high severe TCString V2 (core + disclosed vendors) time: [2.2539 µs 2.2557 µs 2.2576 µs] change: [+48.396% +48.672% +48.959%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe TCString V2 (core + disclosed vendors + allowed vendors) time: [5.0106 µs 5.0166 µs 5.0229 µs] change: [+65.906% +67.190% +68.548%] (p = 0.00 < 0.05) Performance has regressed. Found 19 outliers among 100 measurements (19.00%) 1 (1.00%) high mild 18 (18.00%) high severe TCString V2 (core + publisher tc) time: [715.91 ns 717.07 ns 718.23 ns] change: [+24.120% +24.467% +24.861%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe TCString V2 (core + disclosed vendors + allowed vendors + publisher tc) time: [1.1880 µs 1.1980 µs 1.2080 µs] change: [+51.470% +52.049% +52.751%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) high mild 10 (10.00%) high severe ```

So this would need to be placed behind an opt-in feature. The code used in the latter benchmark can be found here: https://github.com/advancedSTORE/lib_tcstring/commit/09a54c9dbeacfa4e228a2b97ce8aaab09b4b4e52