KonradHoeffner / hdt

Library for the Header Dictionary Triples (HDT) compression file format for RDF data.
https://crates.io/crates/hdt
MIT License
19 stars 4 forks source link

Code review sophia #28

Closed pchampin closed 1 year ago

pchampin commented 1 year ago

This is an attempt at improving the Sophia implementation in HdtGraph (and also the implementation of Hdt).

The general idea is to use Arc instead of MownStr, because Arc can be cloned without actually copying the underlying text, and do not require a persistent "owner" of the text.

The performance gain is not obvious, because what takes most time is FourSectDict::id_to_string, i.e. the reconstruction of the string from the compressed dictionary. But it still seems that by using Arc we spend less allocating memory than when using MownStr.

This PR also contains a number of minor improvements in the use of the Sophia API.

KonradHoeffner commented 1 year ago

Benchmarks

before

S??/1.1 (vincent, ?, ?) triple IDs
                        time:   [1.5088 µs 1.5094 µs 1.5101 µs]
S??/1.2 (vincent, ?, ?) str triples
                        time:   [3.2658 µs 3.2679 µs 3.2701 µs]
S??/1.3 (vincent, ?, ?) Sophia triples
                        time:   [4.5061 µs 4.5081 µs 4.5100 µs]

?P? 1429618 triples/2.1 (?, type, ?) triple IDs
                        time:   [347.29 ms 347.52 ms 347.80 ms]
?P? 1429618 triples/2.2 (?, type, ?) str triples
                        time:   [619.79 ms 620.15 ms 620.50 ms]
?P? 1429618 triples/2.3 (?, type, ?) Sophia triples
                        time:   [651.66 ms 651.96 ms 652.25 ms]

??O 1414214 triples/3.1 (?, ?, person) triple IDs
                        time:   [139.74 ms 139.79 ms 139.84 ms]
??O 1414214 triples/3.2 (?, ?, person) str triples
                        time:   [469.73 ms 469.80 ms 469.87 ms]
??O 1414214 triples/3.3 (?, ?, person) Sophia triples
                        time:   [495.47 ms 495.55 ms 495.62 ms]

?PO 1414214 triples/4.1 (?, type, person) triple IDs
                        time:   [117.97 ms 118.02 ms 118.08 ms]
?PO 1414214 triples/4.2 (?, type, person) str subjects
                        time:   [312.98 ms 313.03 ms 313.08 ms]
?PO 1414214 triples/4.3 (?, type, person) str triples
                        time:   [323.11 ms 323.24 ms 323.38 ms]
?PO 1414214 triples/4.4 (?, type, person) Sophia triples
                        time:   [351.52 ms 351.66 ms 351.81 ms]

after

S??/1.1 (vincent, ?, ?) triple IDs
                        time:   [1.4797 µs 1.4815 µs 1.4835 µs]
                        change: [-1.7325% -1.5302% -1.3180%] (p = 0.00 < 0.05)
                        Performance has improved.
S??/1.2 (vincent, ?, ?) str triples
                        time:   [3.5434 µs 3.5484 µs 3.5539 µs]
                        change: [+8.9300% +9.1300% +9.3659%] (p = 0.00 < 0.05)
                        Performance has regressed.
S??/1.3 (vincent, ?, ?) Sophia triples
                        time:   [4.1382 µs 4.1424 µs 4.1472 µs]
                        change: [-8.1291% -7.9905% -7.8638%] (p = 0.00 < 0.05)
                        Performance has improved.

?P? 1429618 triples/2.1 (?, type, ?) triple IDs
                        time:   [347.97 ms 348.47 ms 349.09 ms]
                        change: [+0.1106% +0.2719% +0.4549%] (p = 0.01 < 0.05)
                        Change within noise threshold.
?P? 1429618 triples/2.2 (?, type, ?) str triples
                        time:   [649.61 ms 650.02 ms 650.51 ms]
                        change: [+4.7266% +4.8163% +4.9268%] (p = 0.00 < 0.05)
                        Performance has regressed.
?P? 1429618 triples/2.3 (?, type, ?) Sophia triples
                        time:   [724.92 ms 726.32 ms 728.73 ms]
                        change: [+11.177% +11.406% +11.761%] (p = 0.00 < 0.05)
                        Performance has regressed.

??O 1414214 triples/3.1 (?, ?, person) triple IDs
                        time:   [140.49 ms 140.60 ms 140.71 ms]
                        change: [+0.4949% +0.5784% +0.6626%] (p = 0.00 < 0.05)
                        Change within noise threshold.
??O 1414214 triples/3.2 (?, ?, person) str triples
                        time:   [508.25 ms 509.08 ms 509.98 ms]
                        change: [+8.1956% +8.3621% +8.5548%] (p = 0.00 < 0.05)
                        Performance has regressed.
??O 1414214 triples/3.3 (?, ?, person) Sophia triples
                        time:   [569.15 ms 569.64 ms 570.18 ms]
                        change: [+14.865% +14.951% +15.058%] (p = 0.00 < 0.05)
                        Performance has regressed.

?PO 1414214 triples/4.1 (?, type, person) triple IDs
                        time:   [118.33 ms 118.59 ms 118.97 ms]
                        change: [+0.2940% +0.4841% +0.6997%] (p = 0.00 < 0.05)
                        Change within noise threshold.
?PO 1414214 triples/4.2 (?, type, person) str subjects
                        time:   [315.39 ms 315.82 ms 316.28 ms]
                        change: [+0.7473% +0.8894% +1.0319%] (p = 0.00 < 0.05)
                        Change within noise threshold.
?PO 1414214 triples/4.3 (?, type, person) str triples
                        time:   [344.67 ms 344.81 ms 344.95 ms]
                        change: [+6.6081% +6.6732% +6.7300%] (p = 0.00 < 0.05)
                        Performance has regressed.
?PO 1414214 triples/4.4 (?, type, person) Sophia triples
                        time:   [366.34 ms 367.00 ms 367.65 ms]
                        change: [+4.1570% +4.3632% +4.5522%] (p = 0.00 < 0.05)
                        Performance has regressed.
KonradHoeffner commented 1 year ago

Benchmarking ??? pattern (all triples) on i9-10900k

Before

Benchmarking ??? (all)/0.1 all triple IDs: Warming up for 3.0000 s
??? (all)/0.1 all triple IDs
                        time:   [1.6171 s 1.6180 s 1.6189 s]
??? (all)/0.2 all str triples
                        time:   [6.3724 s 6.3750 s 6.3785 s]
??? (all)/0.3 all Sophia triples
                        time:   [7.8375 s 7.8492 s 7.8624 s]

After

??? (all)/0.1 all triple IDs
                        time:   [1.6182 s 1.6336 s 1.6480 s]
                        change: [+0.1297% +0.9638% +1.8168%] (p = 0.07 > 0.05)
                        No change in performance detected.
??? (all)/0.2 all str triples
                        time:   [7.1611 s 7.2483 s 7.3353 s]
                        change: [+12.424% +13.699% +14.932%] (p = 0.00 < 0.05)
                        Performance has regressed.
??? (all)/0.3 all Sophia triples
                        time:   [8.5335 s 8.5736 s 8.6102 s]
                        change: [+8.7127% +9.2289% +9.6978%] (p = 0.00 < 0.05)
                        Performance has regressed.

Comparison

It seems to be around 10% slower even when the triple cache is used, which surprises me. Triple ID measurements didn't change which is expected because the code for them hasn't changed as well.

KonradHoeffner commented 1 year ago

Maybe the idx variable needs to be updated in the else part of the get_x_string function?

KonradHoeffner commented 1 year ago
    fn get_x_string(&mut self, i: usize, pos: usize, kind: &'static IdKind) -> Result<Arc<str>, DictErr> {
        debug_assert!(i != 0);
        if self.idx[pos] == i {
            Ok(self.arc[pos].as_ref().unwrap().clone())
        } else {
            let ret: Arc<str> = self.hdt.dict.id_to_string(i, kind)?.into();
            self.arc[pos] = Some(ret.clone());
+          self.idx[pos] = i;
            Ok(ret)
        }
??? (all)/0.1 all triple IDs
                        time:   [1.6308 s 1.6329 s 1.6366 s]
                        change: [+0.7702% +0.9257% +1.1700%] (p = 0.00 < 0.05)
                        Change within noise threshold.
??? (all)/0.2 all str triples
                        time:   [5.8885 s 5.8956 s 5.9026 s]
                        change: [-7.6339% -7.5201% -7.3964%] (p = 0.00 < 0.05)
                        Performance has improved.
??? (all)/0.3 all Sophia triples
                        time:   [6.9292 s 6.9454 s 6.9653 s]
                        change: [-11.755% -11.514% -11.252%] (p = 0.00 < 0.05)
                        Performance has improved.

Now it's better but I wonder if it is possible to not lose 10% performance in the first place, then the triple cache would bring even more speed.

pchampin commented 1 year ago

Maybe the idx variable needs to be updated in the else part of the get_x_string function?

Duh! So silly of me... :facepalm: Thanks for fixing this.

KonradHoeffner commented 1 year ago

Cases

After benchmarking and thinking about this there seem to be the following cases:

  1. triple pattern with three constants SPO: this is just an ask query, so performance shouldn't differ much regardless of implementation
  2. triple patterns with two constants and one variable, SP?, S?O, ?PO: Arc won't do anything because the variable never has the same value twice and the constants we just want to return from the parameters but MownStr is perfect here. This case is my biggest worry because the queries can take a long time to execute and I don't like losing 10% performance because of Arc over MownStr. I don't understand why the overhead of Arc is that high actually, I wonder if there is some way to mitigate that.
  3. triple patterns with one constant and two adjacent variables S?? ??O: there is one constant which we want to pass back, one variable which is easy to deduplicate (P in S?? and S in ??O), and one variable which is hard to deduplicate.
  4. triple pattern with one constant and two disjointed variables ?P?: there is one constant that we want to pass back and two hard to deduplicate variables.
  5. triple pattern with 3 variables ???, which is the best case for Arc but I think not used that often, and is equivalent to the triples() method.

Ideas for Solutions (numbers unrelated)

  1. Use Arc as is: Just take the tradeoff with worse performance in some cases for benefits in others.
  2. Keep it with MownStr and keep the good performance on two variable queries but lose the potential benefits in the other cases (especially the two variables cases need to be deduplicated and benchmarked)
  3. Return normal references instead of Arc along with some kind of String pool. However this is hard to implement because the borrow checker of Rust does not allow returning references with lifetimes relating to something in the same struct. I tried passing the pool as initially empty Vec as parameter but it's not allowed to modify a Vec while there are references to it. Also this is complicated with iterators.
  4. Create a new MownStr variant but when it gets cloned, instead of cloning the Box, which is slow, it creates a reference to the box.
  5. Try Rc instead of Arc. But that may harm applications with multithreading.
  6. Don't try optimizing it and instead tell users to use specific methods instead for each case when performance is an issue. This is feasible for Hdt but not for the Sophia Hdt Graph.

I favor solution 4 but I'm not sure if that's possible with Rust and as second favorite solution 3, but that may be difficult to do as well.

KonradHoeffner commented 1 year ago

I just forked your MownStr repository because I wanted to change clone() so that always returns a borrowed MownStr but I found the borrowed method. Won't that solve our use case without needing Arc?

KonradHoeffner commented 1 year ago

Unfortunately, borrowed returns MownStr instead of Mownstr<'a> so it can't be returned but I successfully changed added the lifetime to a local copy of MownStr. Is this feasible to add or will this break something?

KonradHoeffner commented 1 year ago

Benchmarking it on my Laptop without Sophia (due to MownStr incompatibilities with the local patched version) and strangely, there wasn't much difference with using borrowed() instead of clone():

 cargo bench  -- --load-baseline borrow  --baseline main                                                                                                                                                                   mownstr-borrow
    Finished bench [optimized] target(s) in 0.04s
     Running benches/bench.rs (target/release/deps/bench-561ba078d93cf93c)
??? (all)/0.1 all triple IDs
                        time:   [2.0441 s 2.0450 s 2.0458 s]
                        change: [-1.4375% -0.9551% -0.5860%] (p = 0.00 < 0.05)
                        Change within noise threshold.
??? (all)/0.2 all str triples
                        time:   [8.0467 s 8.0638 s 8.0855 s]
                        change: [-1.1257% -0.6344% -0.1437%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe

S??/1.1 (vincent, ?, ?) triple IDs
                        time:   [2.1847 µs 2.1909 µs 2.2014 µs]
                        change: [-0.2736% -0.0057% +0.2519%] (p = 0.97 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
S??/1.2 (vincent, ?, ?) str triples
                        time:   [5.1951 µs 5.2018 µs 5.2099 µs]
                        change: [+0.3254% +0.6764% +1.0194%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

?P? 1429618 triples/2.1 (?, type, ?) triple IDs
                        time:   [513.92 ms 514.61 ms 515.54 ms]
                        change: [-0.4190% -0.2138% -0.0151%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
?P? 1429618 triples/2.2 (?, type, ?) str triples
                        time:   [968.11 ms 969.58 ms 971.04 ms]
                        change: [+0.2843% +0.4650% +0.6533%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high mild

??O 1414214 triples/3.1 (?, ?, person) triple IDs
                        time:   [209.90 ms 210.17 ms 210.62 ms]
                        change: [+0.0380% +0.1802% +0.3899%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe
??O 1414214 triples/3.2 (?, ?, person) str triples
                        time:   [746.20 ms 754.78 ms 765.05 ms]
                        change: [+1.2616% +2.5159% +3.7646%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  3 (3.00%) high mild
  20 (20.00%) high severe

?PO 1414214 triples/4.1 (?, type, person) triple IDs
                        time:   [174.76 ms 174.93 ms 175.38 ms]
                        change: [+1.0783% +1.6029% +2.2160%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe
?PO 1414214 triples/4.2 (?, type, person) str subjects
                        time:   [468.19 ms 469.26 ms 470.53 ms]
                        change: [-0.7916% -0.5703% -0.2976%] (p = 0.00 < 0.05)
                        Change within noise threshold.
?PO 1414214 triples/4.3 (?, type, person) str triples
                        time:   [471.40 ms 471.85 ms 472.34 ms]
                        change: [-2.6150% -2.4799% -2.3429%] (p = 0.00 < 0.05)
                        Performance has improved.
pchampin commented 1 year ago

Is this feasible to add or will this break something?

It will break something. Consider the following code (assuming that borrowed returns MownStr<'a>:

    let m1: MownStr<'static>;
    {
        let m2: MownStr<'static> = "hello world".to_string().into();
        m1 = m2.borrowed();
        println!("{}", m1);
    }
    println!("{}", m1);

It would compile, but the second println! would print garbage, because the text owned by m2 and borrowed by m1 is unallocated at the closing curly bracket...

KonradHoeffner commented 1 year ago

This is difficult to benchmark with Criterion.rs due to variance in runtimes and the long time due to the multiple runs. In addition to Criterion.rs, I benchmarked some cases with Iai, which uses Valgrind to estimate the number of CPU cycles.

persondata 10k triples

po sophia po sophia - load / mownstr   po po - load / mownstr   o sophia o sophia - load / mownstr   o o - load / mownstr   all sophia all sophia - load / mownstr
mownstr 32025873 6453360 1   30946097 5373584 1   35733746 10161233 1   34930677 9358164 1   168333124 142760611 1
rc 31805323 6232810 0.9658240049   31146148 5573635 1.037228598   36615963 11043450 1.086821845   35477675 9905162 1.058451423   144665536 119093023 0.8342148592
arc 31831044 6258531 0.9698096805   31195529 5623016 1.046418182   36664104 11091591 1.091559558   35528558 9956045 1.063888707   145241514 119669001 0.8382494314
    rc / arc 0.9958902496     rc / arc 0.9912180581     rc / arc 0.9956596849     rc / arc 0.9948892356     rc / arc 0.9951869073
all all - load / mownstr
143615972 118043459 1
126231005 100658492 0.8527240124
126843252 101270739 0.8579106361
  rc / arc 0.9939543544

persondata 100k triples

po sophia po sophia - load / mownstr
mownstr 302179289 64166525 1
rc 299909076 61896312 0.964619979
arc 300229249 62216485 0.9696096991
    rc / arc 0.994853888

persondata 1M triples

po sophia po sophia - load / mownstr
mownstr 3017508492 697058710 1
rc 2993299362 672849580 0.9652695969
arc 2996034366 675584584 0.9691932319
    rc / arc 0.9959516483

Conclusions

KonradHoeffner commented 1 year ago

Enabling the triple cache for all patterns with at two variables gave a massive boost (down to 69%) to the tested ??O query:

  o o - load / mownstr
mownstr  34930677 9358164 1
rc  35477675 9905162 1.058451423
arc  35528558 9956045 1.063888707
cached arc  31998853 6426340 0.686709487
    rc / arc 0.9948892356