PLC-lang / rusty

Structured Text Parser and LLVM Frontend
GNU Lesser General Public License v3.0
181 stars 47 forks source link

Index look-up performance improvements #1189

Open volsa opened 1 month ago

volsa commented 1 month ago

Related to https://github.com/PLC-lang/rusty/issues/1185, we did some further investigations into potential resolver performance improvments and realized that there are no obvious bottlenecks per-se (at least the flamegraph didn't show any) but rather looking at the call-graph the to_lowercase() method becomes suspicious (which is called around ~24 million times in an internal project according to callgrind).

Specifically the find_type (and many other index methods) call to_lowercase() before looking up strings in a HashMap. It might be interesting to see if removing all to_lowercase() calls for look-ups improves the plc check performance. If it does, we may be able to improve the performance by implementing some tweaks into the index while keeping the compiler case-sensitive (because the norm defines it as such). Some of these tweaks may be:

For what it's worth here's the call graph and flame-graph (note the somewhat narrow spikes in the flamegraph, indicating no real bottlenecks are present) callgraph image

volsa commented 1 month ago

Some quick and dirty benchmarks in which I made index look-ups case-sensitive, indicating removing the to_lowercase()s could improve the performance. See also https://github.com/PLC-lang/rusty/compare/master...case-sensitive-index-lookup

Master Branch

~/Development/rusty on  master [$?] is 📦 v0.2.0 via 🦀 v1.77.1 
➜ ./profile.sh 
    Finished release [optimized] target(s) in 0.07s
Benchmark 1: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=1
  Time (mean ± σ):      4.340 s ±  0.039 s    [User: 4.064 s, System: 0.274 s]
  Range (min … max):    4.295 s …  4.411 s    10 runs

Benchmark 2: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=2
  Time (mean ± σ):      2.906 s ±  0.049 s    [User: 4.135 s, System: 0.278 s]
  Range (min … max):    2.835 s …  2.973 s    10 runs

Benchmark 3: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=4
  Time (mean ± σ):      2.110 s ±  0.018 s    [User: 4.108 s, System: 0.286 s]
  Range (min … max):    2.087 s …  2.143 s    10 runs

Benchmark 4: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=8
  Time (mean ± σ):      1.775 s ±  0.025 s    [User: 4.222 s, System: 0.328 s]
  Range (min … max):    1.745 s …  1.816 s    10 runs

Summary
  ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=8 ran
    1.19 ± 0.02 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=4
    1.64 ± 0.04 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=2
    2.45 ± 0.04 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=1

Case Sensitive Branch

~/Development/rusty on  master [$!?] is 📦 v0.2.0 via 🦀 v1.77.1 
➜ ./profile.sh 
    Finished release [optimized] target(s) in 0.07s
Benchmark 1: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=1
  Time (mean ± σ):      3.035 s ±  0.060 s    [User: 2.736 s, System: 0.292 s]
  Range (min … max):    2.956 s …  3.133 s    10 runs

Benchmark 2: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=2
  Time (mean ± σ):      2.051 s ±  0.022 s    [User: 2.765 s, System: 0.261 s]
  Range (min … max):    2.027 s …  2.105 s    10 runs

Benchmark 3: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=4
  Time (mean ± σ):      1.569 s ±  0.021 s    [User: 2.769 s, System: 0.293 s]
  Range (min … max):    1.538 s …  1.593 s    10 runs

Benchmark 4: ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=8
  Time (mean ± σ):      1.337 s ±  0.017 s    [User: 2.842 s, System: 0.311 s]
  Range (min … max):    1.319 s …  1.374 s    10 runs

Summary
  ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=8 ran
    1.17 ± 0.02 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=4
    1.53 ± 0.02 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=2
    2.27 ± 0.05 times faster than ./target/release/plc check ~/Desktop/internal-project/conf/plc.json --error-config=internal-project-profile.json --threads=1
riederm commented 1 month ago

maybe it is also worth looking into haschmap-implementations that use case-insensitive hashing/equal to accomplish the case-insensitive lookups. options:

It is not clear yet if something like unicase is cheaper compared to the to_lower_case calls.

volsa commented 1 week ago

For completeness sake some benchmarks to compare how these different variants differ in their performance

test benches::cached_lookup         ... bench:  77,384,767 ns/iter (+/- 682,933)
test benches::unicase_cow_lookup    ... bench: 128,091,353 ns/iter (+/- 475,277)
test benches::unicase_string_lookup ... bench: 152,539,598 ns/iter (+/- 531,244)
test benches::lowercase_lookup      ... bench: 172,139,632 ns/iter (+/- 6,025,281)

test result: ok. 0 passed; 0 failed; 1 ignored; 4 measured; 0 filtered out; finished in 159.87s
  struct CacheMap<K, V> {
      map: HashMap<K, V>,
      cache: FrozenMap<K, Box<Option<V>>>,
  }

  impl CacheMap<String, i32> {
      fn new() -> Self {
          CacheMap {
              map: HashMap::new(),
              cache: FrozenMap::new(),
          }
      }

      fn insert(&mut self, key: &str, value: i32) {
          self.map.insert(key.to_lowercase(), value);
      }

      fn get(&self, key: &str) -> Option<&i32> {
          if let Some(cached) = self.cache.get(key) {
              return cached.as_ref();
          }

          let lowered = key.to_lowercase();
          match self.map.get(&lowered) {
              Some(entry) => {
                  self.cache.insert(key.to_string(), Box::new(Some(entry.clone())));
                  return Some(entry);
              }
              None => {
                  self.cache.insert(key.to_string(), Box::new(None));
                  return None;
              }
          }
      }
  }

    #[bench]
    fn lowercase_lookup(bench: &mut Bencher) {
        let mut hm: HashMap<String, i32> = HashMap::new();
        hm.insert("FooBar".to_string(), 0);

        bench.iter(|| {
            for _ in 0..=1_000_000 {
                hm.get(&"FooBar".to_lowercase());
                hm.get(&"FOoBar".to_lowercase());
                hm.get(&"FoOBar".to_lowercase());
                hm.get(&"Foobar".to_lowercase());
                hm.get(&"FooBAr".to_lowercase());
                hm.get(&"FooBaR".to_lowercase());
            }
        });
    }

    #[bench]
    fn unicase_string_lookup(bench: &mut Bencher) {
        let mut hm: HashMap<UniCase<String>, i32> = HashMap::new();
        hm.insert(UniCase::new(String::from("FooBar")), 0);

        bench.iter(|| {
            for _ in 0..=1_000_000 {
                hm.get(&UniCase::new(String::from("FooBar")));
                hm.get(&UniCase::new(String::from("FOoBar")));
                hm.get(&UniCase::new(String::from("FoOBar")));
                hm.get(&UniCase::new(String::from("Foobar")));
                hm.get(&UniCase::new(String::from("FooBAr")));
                hm.get(&UniCase::new(String::from("FooBaR")));
            }
        });
    }

    #[bench]
    fn unicase_cow_lookup(bench: &mut Bencher) {
        let mut hm: HashMap<UniCase<Cow<str>>, i32> = HashMap::new();
        hm.insert(UniCase::new(Cow::Owned("FooBar".to_string())), 0);

        bench.iter(|| {
            for _ in 0..=1_000_000 {
                hm.get(&UniCase::new(Cow::Borrowed("FooBar")));
                hm.get(&UniCase::new(Cow::Borrowed("FOoBar")));
                hm.get(&UniCase::new(Cow::Borrowed("FoOBar")));
                hm.get(&UniCase::new(Cow::Borrowed("Foobar")));
                hm.get(&UniCase::new(Cow::Borrowed("FooBAr")));
                hm.get(&UniCase::new(Cow::Borrowed("FooBaR")));
            }
        });
    }

    #[bench]
    fn cached_lookup(bench: &mut Bencher) {
        let mut hm: CacheMap<String, i32> = CacheMap::new();
        hm.insert("FooBar", 0);

        // All of these look-ups will have a cache-miss in their first call, but a cache entry in the following ones.
        // Furthermore all of these will return 0, because a `to_lowercase` will be called exactly once per cache-miss.
        bench.iter(|| {
            for _ in 0..=1_000_000 {
                hm.get("FooBar");
                hm.get("FooBar");
                hm.get("FOoBar");
                hm.get("FoOBar");
                hm.get("Foobar");
                hm.get("FooBAr");
                hm.get("FooBaR");
            }
        });
    }