Closed pchampin closed 1 year ago
S??/1.1 (vincent, ?, ?) triple IDs
time: [1.5088 µs 1.5094 µs 1.5101 µs]
S??/1.2 (vincent, ?, ?) str triples
time: [3.2658 µs 3.2679 µs 3.2701 µs]
S??/1.3 (vincent, ?, ?) Sophia triples
time: [4.5061 µs 4.5081 µs 4.5100 µs]
?P? 1429618 triples/2.1 (?, type, ?) triple IDs
time: [347.29 ms 347.52 ms 347.80 ms]
?P? 1429618 triples/2.2 (?, type, ?) str triples
time: [619.79 ms 620.15 ms 620.50 ms]
?P? 1429618 triples/2.3 (?, type, ?) Sophia triples
time: [651.66 ms 651.96 ms 652.25 ms]
??O 1414214 triples/3.1 (?, ?, person) triple IDs
time: [139.74 ms 139.79 ms 139.84 ms]
??O 1414214 triples/3.2 (?, ?, person) str triples
time: [469.73 ms 469.80 ms 469.87 ms]
??O 1414214 triples/3.3 (?, ?, person) Sophia triples
time: [495.47 ms 495.55 ms 495.62 ms]
?PO 1414214 triples/4.1 (?, type, person) triple IDs
time: [117.97 ms 118.02 ms 118.08 ms]
?PO 1414214 triples/4.2 (?, type, person) str subjects
time: [312.98 ms 313.03 ms 313.08 ms]
?PO 1414214 triples/4.3 (?, type, person) str triples
time: [323.11 ms 323.24 ms 323.38 ms]
?PO 1414214 triples/4.4 (?, type, person) Sophia triples
time: [351.52 ms 351.66 ms 351.81 ms]
S??/1.1 (vincent, ?, ?) triple IDs
time: [1.4797 µs 1.4815 µs 1.4835 µs]
change: [-1.7325% -1.5302% -1.3180%] (p = 0.00 < 0.05)
Performance has improved.
S??/1.2 (vincent, ?, ?) str triples
time: [3.5434 µs 3.5484 µs 3.5539 µs]
change: [+8.9300% +9.1300% +9.3659%] (p = 0.00 < 0.05)
Performance has regressed.
S??/1.3 (vincent, ?, ?) Sophia triples
time: [4.1382 µs 4.1424 µs 4.1472 µs]
change: [-8.1291% -7.9905% -7.8638%] (p = 0.00 < 0.05)
Performance has improved.
?P? 1429618 triples/2.1 (?, type, ?) triple IDs
time: [347.97 ms 348.47 ms 349.09 ms]
change: [+0.1106% +0.2719% +0.4549%] (p = 0.01 < 0.05)
Change within noise threshold.
?P? 1429618 triples/2.2 (?, type, ?) str triples
time: [649.61 ms 650.02 ms 650.51 ms]
change: [+4.7266% +4.8163% +4.9268%] (p = 0.00 < 0.05)
Performance has regressed.
?P? 1429618 triples/2.3 (?, type, ?) Sophia triples
time: [724.92 ms 726.32 ms 728.73 ms]
change: [+11.177% +11.406% +11.761%] (p = 0.00 < 0.05)
Performance has regressed.
??O 1414214 triples/3.1 (?, ?, person) triple IDs
time: [140.49 ms 140.60 ms 140.71 ms]
change: [+0.4949% +0.5784% +0.6626%] (p = 0.00 < 0.05)
Change within noise threshold.
??O 1414214 triples/3.2 (?, ?, person) str triples
time: [508.25 ms 509.08 ms 509.98 ms]
change: [+8.1956% +8.3621% +8.5548%] (p = 0.00 < 0.05)
Performance has regressed.
??O 1414214 triples/3.3 (?, ?, person) Sophia triples
time: [569.15 ms 569.64 ms 570.18 ms]
change: [+14.865% +14.951% +15.058%] (p = 0.00 < 0.05)
Performance has regressed.
?PO 1414214 triples/4.1 (?, type, person) triple IDs
time: [118.33 ms 118.59 ms 118.97 ms]
change: [+0.2940% +0.4841% +0.6997%] (p = 0.00 < 0.05)
Change within noise threshold.
?PO 1414214 triples/4.2 (?, type, person) str subjects
time: [315.39 ms 315.82 ms 316.28 ms]
change: [+0.7473% +0.8894% +1.0319%] (p = 0.00 < 0.05)
Change within noise threshold.
?PO 1414214 triples/4.3 (?, type, person) str triples
time: [344.67 ms 344.81 ms 344.95 ms]
change: [+6.6081% +6.6732% +6.7300%] (p = 0.00 < 0.05)
Performance has regressed.
?PO 1414214 triples/4.4 (?, type, person) Sophia triples
time: [366.34 ms 367.00 ms 367.65 ms]
change: [+4.1570% +4.3632% +4.5522%] (p = 0.00 < 0.05)
Performance has regressed.
Benchmarking ??? (all)/0.1 all triple IDs: Warming up for 3.0000 s
??? (all)/0.1 all triple IDs
time: [1.6171 s 1.6180 s 1.6189 s]
??? (all)/0.2 all str triples
time: [6.3724 s 6.3750 s 6.3785 s]
??? (all)/0.3 all Sophia triples
time: [7.8375 s 7.8492 s 7.8624 s]
??? (all)/0.1 all triple IDs
time: [1.6182 s 1.6336 s 1.6480 s]
change: [+0.1297% +0.9638% +1.8168%] (p = 0.07 > 0.05)
No change in performance detected.
??? (all)/0.2 all str triples
time: [7.1611 s 7.2483 s 7.3353 s]
change: [+12.424% +13.699% +14.932%] (p = 0.00 < 0.05)
Performance has regressed.
??? (all)/0.3 all Sophia triples
time: [8.5335 s 8.5736 s 8.6102 s]
change: [+8.7127% +9.2289% +9.6978%] (p = 0.00 < 0.05)
Performance has regressed.
It seems to be around 10% slower even when the triple cache is used, which surprises me. Triple ID measurements didn't change which is expected because the code for them hasn't changed as well.
Maybe the idx variable needs to be updated in the else part of the get_x_string function?
fn get_x_string(&mut self, i: usize, pos: usize, kind: &'static IdKind) -> Result<Arc<str>, DictErr> {
debug_assert!(i != 0);
if self.idx[pos] == i {
Ok(self.arc[pos].as_ref().unwrap().clone())
} else {
let ret: Arc<str> = self.hdt.dict.id_to_string(i, kind)?.into();
self.arc[pos] = Some(ret.clone());
+ self.idx[pos] = i;
Ok(ret)
}
??? (all)/0.1 all triple IDs
time: [1.6308 s 1.6329 s 1.6366 s]
change: [+0.7702% +0.9257% +1.1700%] (p = 0.00 < 0.05)
Change within noise threshold.
??? (all)/0.2 all str triples
time: [5.8885 s 5.8956 s 5.9026 s]
change: [-7.6339% -7.5201% -7.3964%] (p = 0.00 < 0.05)
Performance has improved.
??? (all)/0.3 all Sophia triples
time: [6.9292 s 6.9454 s 6.9653 s]
change: [-11.755% -11.514% -11.252%] (p = 0.00 < 0.05)
Performance has improved.
Now it's better but I wonder if it is possible to not lose 10% performance in the first place, then the triple cache would bring even more speed.
Maybe the idx variable needs to be updated in the else part of the get_x_string function?
Duh! So silly of me... :facepalm: Thanks for fixing this.
After benchmarking and thinking about this there seem to be the following cases:
I favor solution 4 but I'm not sure if that's possible with Rust and as second favorite solution 3, but that may be difficult to do as well.
I just forked your MownStr repository because I wanted to change clone()
so that always returns a borrowed MownStr but I found the borrowed
method. Won't that solve our use case without needing Arc?
Unfortunately, borrowed returns MownStr instead of Mownstr<'a> so it can't be returned but I successfully changed added the lifetime to a local copy of MownStr. Is this feasible to add or will this break something?
Benchmarking it on my Laptop without Sophia (due to MownStr incompatibilities with the local patched version) and strangely, there wasn't much difference with using borrowed() instead of clone():
cargo bench -- --load-baseline borrow --baseline main mownstr-borrow
Finished bench [optimized] target(s) in 0.04s
Running benches/bench.rs (target/release/deps/bench-561ba078d93cf93c)
??? (all)/0.1 all triple IDs
time: [2.0441 s 2.0450 s 2.0458 s]
change: [-1.4375% -0.9551% -0.5860%] (p = 0.00 < 0.05)
Change within noise threshold.
??? (all)/0.2 all str triples
time: [8.0467 s 8.0638 s 8.0855 s]
change: [-1.1257% -0.6344% -0.1437%] (p = 0.03 < 0.05)
Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
2 (20.00%) high severe
S??/1.1 (vincent, ?, ?) triple IDs
time: [2.1847 µs 2.1909 µs 2.2014 µs]
change: [-0.2736% -0.0057% +0.2519%] (p = 0.97 > 0.05)
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
S??/1.2 (vincent, ?, ?) str triples
time: [5.1951 µs 5.2018 µs 5.2099 µs]
change: [+0.3254% +0.6764% +1.0194%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
?P? 1429618 triples/2.1 (?, type, ?) triple IDs
time: [513.92 ms 514.61 ms 515.54 ms]
change: [-0.4190% -0.2138% -0.0151%] (p = 0.06 > 0.05)
No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) high mild
1 (10.00%) high severe
?P? 1429618 triples/2.2 (?, type, ?) str triples
time: [968.11 ms 969.58 ms 971.04 ms]
change: [+0.2843% +0.4650% +0.6533%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) low mild
1 (10.00%) high mild
??O 1414214 triples/3.1 (?, ?, person) triple IDs
time: [209.90 ms 210.17 ms 210.62 ms]
change: [+0.0380% +0.1802% +0.3899%] (p = 0.03 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
??O 1414214 triples/3.2 (?, ?, person) str triples
time: [746.20 ms 754.78 ms 765.05 ms]
change: [+1.2616% +2.5159% +3.7646%] (p = 0.00 < 0.05)
Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
3 (3.00%) high mild
20 (20.00%) high severe
?PO 1414214 triples/4.1 (?, type, person) triple IDs
time: [174.76 ms 174.93 ms 175.38 ms]
change: [+1.0783% +1.6029% +2.2160%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
2 (20.00%) high severe
?PO 1414214 triples/4.2 (?, type, person) str subjects
time: [468.19 ms 469.26 ms 470.53 ms]
change: [-0.7916% -0.5703% -0.2976%] (p = 0.00 < 0.05)
Change within noise threshold.
?PO 1414214 triples/4.3 (?, type, person) str triples
time: [471.40 ms 471.85 ms 472.34 ms]
change: [-2.6150% -2.4799% -2.3429%] (p = 0.00 < 0.05)
Performance has improved.
Is this feasible to add or will this break something?
It will break something. Consider the following code (assuming that borrowed
returns MownStr<'a>
:
let m1: MownStr<'static>;
{
let m2: MownStr<'static> = "hello world".to_string().into();
m1 = m2.borrowed();
println!("{}", m1);
}
println!("{}", m1);
It would compile, but the second println!
would print garbage, because the text owned by m2
and borrowed by m1
is unallocated at the closing curly bracket...
This is difficult to benchmark with Criterion.rs due to variance in runtimes and the long time due to the multiple runs. In addition to Criterion.rs, I benchmarked some cases with Iai, which uses Valgrind to estimate the number of CPU cycles.
po sophia | po sophia - load | / mownstr | po | po - load | / mownstr | o sophia | o sophia - load | / mownstr | o | o - load | / mownstr | all sophia | all sophia - load | / mownstr | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mownstr | 32025873 | 6453360 | 1 | 30946097 | 5373584 | 1 | 35733746 | 10161233 | 1 | 34930677 | 9358164 | 1 | 168333124 | 142760611 | 1 | ||||
rc | 31805323 | 6232810 | 0.9658240049 | 31146148 | 5573635 | 1.037228598 | 36615963 | 11043450 | 1.086821845 | 35477675 | 9905162 | 1.058451423 | 144665536 | 119093023 | 0.8342148592 | ||||
arc | 31831044 | 6258531 | 0.9698096805 | 31195529 | 5623016 | 1.046418182 | 36664104 | 11091591 | 1.091559558 | 35528558 | 9956045 | 1.063888707 | 145241514 | 119669001 | 0.8382494314 | ||||
rc / arc | 0.9958902496 | rc / arc | 0.9912180581 | rc / arc | 0.9956596849 | rc / arc | 0.9948892356 | rc / arc | 0.9951869073 |
all | all - load | / mownstr |
---|---|---|
143615972 | 118043459 | 1 |
126231005 | 100658492 | 0.8527240124 |
126843252 | 101270739 | 0.8579106361 |
rc / arc | 0.9939543544 |
po sophia | po sophia - load | / mownstr | |
---|---|---|---|
mownstr | 302179289 | 64166525 | 1 |
rc | 299909076 | 61896312 | 0.964619979 |
arc | 300229249 | 62216485 | 0.9696096991 |
rc / arc | 0.994853888 |
po sophia | po sophia - load | / mownstr | |
---|---|---|---|
mownstr | 3017508492 | 697058710 | 1 |
rc | 2993299362 | 672849580 | 0.9652695969 |
arc | 2996034366 | 675584584 | 0.9691932319 |
rc / arc | 0.9959516483 |
Enabling the triple cache for all patterns with at two variables gave a massive boost (down to 69%) to the tested ??O query:
o | o - load | / mownstr | |
---|---|---|---|
mownstr | 34930677 | 9358164 | 1 |
rc | 35477675 | 9905162 | 1.058451423 |
arc | 35528558 | 9956045 | 1.063888707 |
cached arc | 31998853 | 6426340 | 0.686709487 |
rc / arc | 0.9948892356 |
This is an attempt at improving the Sophia implementation in HdtGraph (and also the implementation of Hdt).
The general idea is to use Arc instead of MownStr, because Arc can be cloned without actually copying the underlying text, and do not require a persistent "owner" of the text.
The performance gain is not obvious, because what takes most time is we spend less allocating memory than when using MownStr.
FourSectDict::id_to_string
, i.e. the reconstruction of the string from the compressed dictionary. But it still seems that by using ArcThis PR also contains a number of minor improvements in the use of the Sophia API.