Closed Glitchy-Tozier closed 2 years ago
One last idea I have is the following:
fixed_keys:
or part of fixed_layers: [2, 3]
.map_unigrams
, we only map unfixed chars and finally add them to the fixed ones.The fixed chars & fixed layers include the following characters:
…_[]^!<>=&ſ\/{}*?()-:@#$|~`+%"';1234567890¡¿:−ªº№·£¤\t\n/*−°§ℓ»«$€„“”₁₂₃♀♂⚥ϰ⟨⟩₀‑¬∨∧⊥∡∥→∞∝ ∅
When using regex and searching for trigrams which contain those characters
[…_\[\]\^!<>=&ſ\\/\{\}\*\?\(\)-:@#\$\|~`\+%"';1234567890¡¿:−ªº№·£¤\n\t/\*−°§ℓ»«\$€„“”₁₂₃♀♂⚥ϰ⟨⟩₀‑¬∨∧⊥∡∥→∞∝ ∅]{3}\n
, we find that about 16_000 of the 290_000 Trigrams contain only fixed chars. Thus, this idea might boost performance by an additional ~ 5%. That being said, I'm not sure how to implement this concept, so I'll leave it be.
I have mixed feelings about this pull request.
On one hand, I see that using a HashMap
instead of the Vec
for the *GramIndices
saves some conversions. But I am a little uncomfortable with then having both structures there. Is it possible to get rid of the *GramIndicesVec
version and only convert them into a Vec
in the last step, when the values are being placed into the MappedNgrams
? Then we would only keep the HashMap
version and thus save the conversions.
Also I see that there are now multiple instances where a AHashMap
is used in favor of the FxHashMap
. I would prefer being consistent here and use AHashMap
everywhere (if performance does not degrade too much) while completely getting rid of the FxHashMap
. This then also removes the rustc_hash
dependency. Did you check what the performance impact would be using only AHashMap
s?
Finally, I see your reasoning for implementing the PartialEq
manually. However, I do not like having the index
inside the LayerKey
. I understand that it is only meant to be a "unique" value. But using the LayerKeyIndex
is very misleading as it encodes information "from the outside". How much time does using the "automatic" implementation of PartialEq
actually cost?
Hi :)
Your commit
The / 3
in let mut bigram_w_map = AHashMap::with_capacity(bigrams.len() / 3);
was intentional. Due to the fact that we do not fill the hashmap with any duplicates (and due to the fact that we only fill it with Layer1-letters), it's usually smaller than 1/3rd or ever 1/4th or 1/5th of bigrams.len()
. Thus I added the / 3
because it seems to be the most efficient thing to do.
I'd prefer keeping the / 3
, but if you know of scenarios I forgot, I'm happy to remove the / 3
.
- On one hand, I see that using a
HashMap
instead of theVec
for the*GramIndices
saves some conversions. But I am a little uncomfortable with then having both structures there. Is it possible to get rid of the*GramIndicesVec
version and only convert them into aVec
in the last step, when the values are being placed into theMappedNgrams
? Then we would only keep theHashMap
version and thus save the conversions.
I share your feeling that this is ugly. In my tests this actually had a performance impact, and was the main thing that saved time. (The more correct version of that sentence is: "Changing this into a HashMap slowed everything back down, nullifying the other improvements".) This might be due to one of two reasons:
fn map_unigrams(
, fn map_bigrams(
, and fn map_trigrams(
.)That being said:
- Also I see that there are now multiple instances where a
AHashMap
is used in favor of theFxHashMap
. I would prefer being consistent here and useAHashMap
everywhere (if performance does not degrade too much) while completely getting rid of theFxHashMap
. This then also removes therustc_hash
dependency.
I started using AHashMap
due to their comparison to other hashers. Accordingly, I did my best to consintently use one or the other using the following rule:
Did you check what the performance impact would be using only
AHashMap
s?
I did not and it's probably negligible. We can use AHashMap
everywhere, if you prefer. :)
If we decide on removing FxHash…:
FxHashSets
(such as exclude_rows
or initial_pause_indicators
) with VecsFinally, I see your reasoning for implementing the PartialEq manually. However, I do not like having the index inside the LayerKey. I understand that it is only meant to be a "unique" value. But using the LayerKeyIndex is very misleading as it encodes information "from the outside". How much time does using the "automatic" implementation of PartialEq actually cost?
How much time does using the "automatic" implementation of PartialEq actually cost?
Not much. I assume this is because most of the time the first fields are not equal to each other, and thus the "comparison-train" is stopped preemptively.
But using the LayerKeyIndex is very misleading as it encodes information "from the outside".
I don't quite understand this part. Is it problematic because it references a "higher order concept", such as the layout? LayerKeyIndex = "This is my position in the Layout"
?
In that case, my proposal is the following:
Rename pub index: LayerKeyIndex
into pub id: u16
. This way, we do not imply any relation to the layout (and thus are not misleading), but still are able to do efficient comparisons. What do you think?
Actually … there's no need at all to fill either a Vec
or a HashMap
in fn map_trigrams
. It's redundant. We use the value only once (to iterate over it), thus we can simply return an Iterator.
This skips
I've got a fun & consise idea on how to implement this, if you give me the go :)
Hi :)
Your commit
The
/ 3
inlet mut bigram_w_map = AHashMap::with_capacity(bigrams.len() / 3);
was intentional. Due to the fact that we do not fill the hashmap with any duplicates (and due to the fact that we only fill it with Layer1-letters), it's usually smaller than 1/3rd or ever 1/4th or 1/5th ofbigrams.len()
. Thus I added the/ 3
because it seems to be the most efficient thing to do. I'd prefer keeping the/ 3
, but if you know of scenarios I forgot, I'm happy to remove the/ 3
.
I am okay with this, if you checked that these relations are indeed as you suggest. My understanding was that a single ngram with higher-layer symbols will be "expanded" into several new ngrams involving modifiers. But I do not remember the quantities anymore.
- On one hand, I see that using a
HashMap
instead of theVec
for the*GramIndices
saves some conversions. But I am a little uncomfortable with then having both structures there. Is it possible to get rid of the*GramIndicesVec
version and only convert them into aVec
in the last step, when the values are being placed into theMappedNgrams
? Then we would only keep theHashMap
version and thus save the conversions.I share your feeling that this is ugly. In my tests this actually had a performance impact, and was the main thing that saved time. (The more correct version of that sentence is: "Changing this into a HashMap slowed everything back down, nullifying the other improvements".) This might be due to one of two reasons:
If the performance gains/losses are not orders of magnitude, I definitely prefer an understandable and clean code over "little" performance differences; be it with a Vec or a HashMap.
1. Inserting Items into a HashMap might be slower than pushing them into a Vec (which would make sense as additional hashing-computations are performed) (Which happens repeatedly in `fn map_unigrams(`, `fn map_bigrams(`, and `fn map_trigrams(`.) 2. Iterating through a HashMap might be slower (no idea how reasonable this idea is)
That being said:
1. my tests might have been imperfect 2. we could just remove the type but still use a Vec to return the values from that one function.
- Also I see that there are now multiple instances where a
AHashMap
is used in favor of theFxHashMap
. I would prefer being consistent here and useAHashMap
everywhere (if performance does not degrade too much) while completely getting rid of theFxHashMap
. This then also removes therustc_hash
dependency.I started using
AHashMap
due to their comparison to other hashers. Accordingly, I did my best to consintently use one or the other using the following rule:* FxHash for primitive keys * AHash for more complex keys such as Tuples or Strings.
With the same reasoning as above (prefer clean code over only little performance gains), I would suggest just using AHashMap
/AHashSet
.
Did you check what the performance impact would be using only
AHashMap
s?I did not and it's probably negligible. We can use
AHashMap
everywhere, if you prefer. :) If we decide on removing FxHash…:* Let's wait until the two PRs are gone, to avoid constant merge-conflicts
Ok.
* It's probably most efficient to replace the really small `FxHashSets` (such as `exclude_rows` or `initial_pause_indicators`) with Vecs
I am not sure. With HashMaps we get an O(1)
lookup. Maybe for small Vecs this is still slower... haven't checked.
Finally, I see your reasoning for implementing the PartialEq manually. However, I do not like having the index inside the LayerKey. I understand that it is only meant to be a "unique" value. But using the LayerKeyIndex is very misleading as it encodes information "from the outside". How much time does using the "automatic" implementation of PartialEq actually cost?
How much time does using the "automatic" implementation of PartialEq actually cost?
Not much. I assume this is because most of the time the first fields are not equal to each other, and thus the "comparison-train" is stopped preemptively.
But using the LayerKeyIndex is very misleading as it encodes information "from the outside".
I don't quite understand this part. Is it problematic because it references a "higher order concept", such as the layout?
LayerKeyIndex = "This is my position in the Layout"
? In that case, my proposal is the following: Renamepub index: LayerKeyIndex
intopub id: u16
. This way, we do not imply any relation to the layout (and thus are not misleading), but still are able to do efficient comparisons. What do you think?
The main danger that I see is that with an id
or index
we allow misuse of the interface. If a user of the LayerKey
sets a wrong id
(the same id
for different LayerKey
s) he gets wrong results (comparison being true even though the LayerKey
s are not equal). Having such a field always requires that it is unique, but we leave it to the user to actually enforce this. To me this screams "bad/dangerous interface".
Actually … there's no need at all to fill either a
Vec
or aHashMap
infn map_trigrams
. It's redundant. We use the value only once (to iterate over it), thus we can simply return an Iterator.This skips
1. Pushing > 200_000 trigrams into a vec 2. turning that vec back into an iterable.
I've got a fun & consise idea on how to implement this, if you give me the go :)
Skipping 1. would be nice. Point 2 is not expensive.
Again, if the idea that you have keeps the code simple, please give it a go. If it gets too complex and the gains are small (<15% or so), I would be uncomfortable with it.
filter_map().collect()
.BigramIndicesVec
to the respective files, as it's not used anywhere else. In theory we could do the same thing with BigramIndices
I think.
Anyway, what is your opinion on the setup with the moved BigramIndicesVec
-type definition? If you still dislike it, tell me and I'll replace it with a HashMap
.pub fn add_secondary_bigrams_from_trigrams()
gets destroyed by removing the index.
This was the original reason for adding the index. Do you still want to remove it?
- I gave up on the iterator-thing (they're messy). However, I was able to prevent the >200000 pushes by using
filter_map().collect()
.
This looks cleaner. I like it. (The pushes are still done within collect
, though.)
2. I moved `BigramIndicesVec` to the respective files, as it's not used anywhere else. In theory we could do the same thing with `BigramIndices` I think.
Good idea. Having them there was a historical artefact from a time when there was a "not-on-demand" NGramMapper variant.
Anyway, what is your opinion on the setup with the moved `BigramIndicesVec`-type definition? If you still dislike it, tell me and I'll replace it with a `HashMap`.
Having these types only in this module makes it alright, I think. No replacement required.
3. I tried to remove the index but then found out that my [pretty little improvement of `pub fn add_secondary_bigrams_from_trigrams()`](https://github.com/dariogoetz/keyboard_layout_optimizer/pull/31/files#diff-9310aba2f5f0f1f27a7ab6df86be60e2678da8b8a12c3b94fffbcce3629cc512L96-L140) gets destroyed by removing the index. This was the original reason for adding the index. Do you still want to remove it?
I would still like to see it removed.
I would still like to see it removed.
Ok
Good idea. Having them there was a historical artefact from a time when there was a "not-on-demand" NGramMapper variant.
Do you also want me to move the "normal" BigramIndices
?
I would still like to see it removed.
Ok
Good idea. Having them there was a historical artefact from a time when there was a "not-on-demand" NGramMapper variant.
Do you also want me to move the "normal"
BigramIndices
?
Yes, please.
// if the same modifier appears consecutively, it is usually "hold" instead of repeatedly pressed
// --> remove
let trigrams = trigrams
.into_iter()
.filter(|((k1, k2, k3), _)| !(k2.is_modifier && (k1 == k2 || k2 == k3)))
.collect();
into OnDemandTrigramMapper::layerkeys(&trigram_key_indices, layout);
. I'd do the same thing for bigrams as well.
This would prevent another allocation of a Vec and make the code in map_ngrams()
more uniform. See the two most recent commits. It seems more efficient in every way. Less work is done, less vectors are filled.
The only question is whether you feel that this might be a misuse of OnDemandTrigramMapper::layerkeys(&trigram_key_indices, layout);
. To me it seems fine. Many other methods (such as bigram_mapper::add_secondary_bigrams_from_trigrams()
or split_trigram_modifiers()
) also perform internal filtering.
It is certainly not beautiful. Optimally, all kinds of filtering or additions of the set of ngrams would be within functions that obviously do so. add_secondary...
and split...modifiers
obviously change the ngram set, so there it is okay.
But probably it's fine here.
Alright. Later on today I'll at least add a doc-comment to those functions, explaining that there's also some slight filtering going on. After that, I'll re-test and merge the PR unless anything new comes up.
The purpose of this PR is to improve performance while still keeping the script elegant and clean. I did my best to rename the affected functions to indicate their added functionality. If there's anything else you dislike, please tell me.
I've found and added a way to use filter_map()
and .collect()
without loosing performance. The problem was that filter_map()
(and the regular filter()
) set the minimum size of the iterator's size_hint()
to 0, meaning that – as opposed to other circumstances – .collect()
will not allocate any memory.
A more elegant solution might be to write our own collect_for_capacity()
-method which roughly will look like this:
impl CapacityCollect for Iterator {
fn collect_for_capacity(&self, capacity: usize) -> Vec[T] {
let mut result = Vec::with_capacity(capacity);
result.extend(self);
result
}
}
The purpose of this extraction would be that we could use it in other contexts; Not necessarily to improve speed, but to simplify code like this, where there's actually not much going on, but it could use some decluttering:
let mut metric_costs: Vec<MetricResult> = Vec::new();
for (weight, normalization, metric) in self.layout_metrics.iter() {
let (cost, message) = metric.total_cost(layout);
metric_costs.push(MetricResult {
name: metric.name().to_string(),
cost,
weight: *weight,
normalization: normalization.clone(),
message,
});
}
The only problem is that
PartialEq
-implementation, as there's no more possibility of the user to mess it up? (We would check for duplicate matrix-positions in the validation.)1.
The purpose of this PR is to improve performance while still keeping the script elegant and clean. I did my best to rename the affected functions to indicate their added functionality. If there's anything else you dislike, please tell me.
The naming is better now. I'm fine with it.
2.
I've found and added a way to use
filter_map()
and.collect()
without loosing performance. The problem was thatfilter_map()
(and the regularfilter()
) set the minimum size of the iterator'ssize_hint()
to 0, meaning that – as opposed to other circumstances –.collect()
will not allocate any memory.A more elegant solution might be to write our own
collect_for_capacity()
-method which roughly will look like this:impl CapacityCollect for Iterator { fn collect_for_capacity(&self, capacity: usize) -> Vec[T] { let mut result = Vec::with_capacity(capacity); result.extend(self); result } }
The purpose of this extraction would be that we could use it in other contexts; Not necessarily to improve speed, but to simplify code like this, where there's actually not much going on, but it could use some decluttering:
let mut metric_costs: Vec<MetricResult> = Vec::new(); for (weight, normalization, metric) in self.layout_metrics.iter() { let (cost, message) = metric.total_cost(layout); metric_costs.push(MetricResult { name: metric.name().to_string(), cost, weight: *weight, normalization: normalization.clone(), message, }); }
The only problem is that
1. I don't know how you feel about this idea (I myself think that it might be overkill) 2. I don't know where this implementation should be placed to be accessible in different modules
I like the solution that you found (::with_capacity
and then .extend
). But I feel that having it in a separate trait is unnecessary. Keep it as it is, I would say.
3.
1. What are your thoughts on implementing a simple validation of the (keyboard-)config? I could add it after this PR.
This is a good idea. Checking user-provided configs is good in any case.
2. If that config was added, would you be down to keep the less redundant `PartialEq`-implementation, as there's no more possibility of the user to mess it up? (We would check for duplicate matrix-positions in the validation.)
I still dislike the manual PartialEq
-implementation. The "user" in the sense of "user of the application" may not, but "user of the data structure" might still hit the mentioned confusions.
PS: Sorry. I was logged into the wrong github-account.
Closes #29
Turns out the ideas proposed in #29 don't make that big of a difference. A short optimization run went from ~24.5s to ~23.0s, which is roughly a 6% speed increase. This is the command I used for testing: (with 201 iterations)
Additionally to the ideas proposed in #29, I also manually implemented
PartialEq
forLayerKey
, as deriving it means that all fields get compared. This used to result in redundant comparisons (assuming that theLayerKey
'sindex
unique to a particularLayerkey
).As a safety-check, I compared the result of an evaluation before and after this PR:
Before (no changes)
After PR