Speed up `mapped_ngrams`

This seems to be by far the most expensive part of the script. flamegraph

So here's a few ideas that might speed it up (I'm focusing on trigrams because they're the most expensive)

Replacing `groupby_sum(`

Currently this happens:

The trigrams' layerkey_indices are generated (they're a Vec)
In groupby_sum( they get added into a HashMap, filtering out duplicates
In groupby_sum( the HashMap gets transformed back into a Vec

This seems somewhat redundand to me. I am not sure how much those additional loops impact performance, but it seems to me like we could add the following improvements:

In split_trigram_modifiers(, we don't fill up a Vec. Instead, we fill up a HashMap and use it later on. (This removes groupby_sum)

This idea seems to let us remove 2/3 loops through all trigrams.

Directly send LayerKeys to `bigram_mapper::add_secondary_bigrams_from_trigrams(`.

At this part of mapped_ngrams…

// (if enabled) add bigrams consisting of first and third trigram symbols to vec of bigrams
bigram_mapper::add_secondary_bigrams_from_trigrams(
    &mut bigram_key_indices,
    &trigram_key_indices,
    &self.config.secondary_bigrams_from_trigrams,
    layout,
);

…we already have access to the list of trigram-layerkeys. We received them a few lines up:

let trigrams = OnDemandTrigramMapper::layerkeys(&trigram_key_indices, layout);

However, we glance over that fact and send the indices to bigram_mapper::add_secondary_bigrams_from_trigrams(, just to turn the indicies back into LayerKeys:

/// Add secondary bigrams from the first and third symbol of a trigram (if they belong to the same hand).
pub fn add_secondary_bigrams_from_trigrams(
    bigram_keys: &mut BigramIndices,
    trigram_keys: &TrigramIndices,
    config: &SecondaryBigramsFromTrigramsConfig,
    layout: &Layout,
) {
    if !config.enabled {
        return;
    }

    // there are many duplicates in the secondary bigrams -> using a hashmap is cheaper
    let mut m = FxHashMap::with_capacity_and_hasher(trigram_keys.len(), Default::default());
    trigram_keys
        .iter()
        .map(|((idx1, idx2, idx3), w)| {
            (
                (
                    (idx1, layout.get_layerkey(idx1)),
                    (idx2, layout.get_layerkey(idx2)),
                    (idx3, layout.get_layerkey(idx3)),
                ),
                w,
            )
        })
        .filter(

I feel like straight-up sending the layerkeys to bigram_mapper::add_secondary_bigrams_from_trigrams( might save some execution-time. At the moment, I don't see a downside to that improvement.

Edit: turns out, this layout.get_layerkey() is pretty well optimized. However, immediately sending the Vec of LayerKeys should still give a performance-boost, as it allows for immediate filtering and pretty much nullifies the necessity for .map().

(I think it should be called "map_ngrams", right?)

dariogoetz / keyboard_layout_optimizer