bclnr / kb-layout-evaluation

Evaluate ergonomic keyboard layouts over multiple languages
MIT License
92 stars 12 forks source link
colemak dvorak ergo ergonomic-keyboard ergonomics keyboard keyboard-layout

Keyboard layout evaluation

An evaluation of existing keyboard layouts over multiple languages, focused on ergonomic keyboards.

Many keyboard layouts are designed (by hand or generated by an algorithm) to improve the ergonomics of Qwerty. However, they are typically assessed for typing in a single language, and on a standard keyboard. This analysis evaluates those layouts over several languages, and for an ergonomic keyboard.

The method uses statistics of bigram use (sets of 2 letters) for each language, and grades them according to subjective "weights" (depending on the keys positions), to calculate a comparative difficulty between layouts.

It is focused on writing text, not code. Programming requires special characters more suited to an additional layer; this script focuses on the base layer. Also, no attempt was made to generate a new "optimum" layout, as there would be a lot of variation depending on subjective parameters. Existing layouts already perform very well here, including some generated ones.

The results show that any alternative layout gives a significant ergonomic advantage over Qwerty. Several options give good results, in particular Colemak DH which also brings good familiarity, accessible shortcuts, and positive user feedback.

results

This project has also been modified and used here.

Table of contents

Character statistics

Contained in folder character_stats.

The layout evaluation needs bigram frequencies (sets of 2 letters) for each language.

The frequencies come from Practical Cryptography for English, French, Spanish, German, and Swedish; and from Norvig for English (to compare).

For comparison, my own corpus is also analyzed (for English and French); made of my emails, some texts from free books, and some internet articles.

count.py

Requirements: Python 3, Pandas.

The script count.py takes the text files in the data folder and outputs the character counts in chars.csv, and the bigram counts in bigrams.csv. Upper case is converted to lower case.

The list of characters to take into account is configurable in the code, in the list chars. Currently it takes the basic alphabet, plus éèêàçâîô.,-'/.

The provided chars.csv and bigrams.csv files were generated with a personal corpus of emails (mails_en and mails_fr, 300\~400kB of raw text each) and various free books and articles (vrac_en and vrac_fr, 200\~400kB each).

Spreadsheet analysis

This analysis is done in the Libreoffice spreadsheet stats.ods.

Character counts

The character frequencies for both English and French are quite consistent between the sources and my own corpus.

chars_en

Here is the same chart for French.

Bigram counts

The bigram counts show more discrepancies. The charts below show the top 80 bigrams (sorted by the average of both sources for English).

bigram_en

Here is the same chart for French.

Punctuation

The "theory" numbers from Practical Cryptography and Norvig do not contain data on punctuation characters such as .,-'/. However while the use of . and , depend a lot on each author's style, taking into account a non-zero frequency is essential due to their frequency.

The relevant bigram frequencies are not present in any existing statistics, but for French the Bépo layout project gives character frequencies from a Wikipedia 2008 dump.

For English, Vivian Cook's analysis gives frequencies by word. This University of Maryland paper (Table 1) gives a frequency for . of 1.151% (from Google Ngram data).

The frequencies from the personal corpus contain those characters so it can be used as a control. However, it is useful to have more realistic "theory" results. The approach is to "fix" the missing frequencies by copying the relevant ones from the personal corpus and normalizing the column.

To check that those inputs make sense, they can be compared to the few data available.

For English, we can estimate by dividing the frequencies from Vivian Cook by the average word length of 4.79 letters (from Norvig).

For English Personal corpus Frequency / Word length Google Ngram
Period . 1.6 % 1.36 % 1.151 %
Comma , 1.2 % 1.29 %
For French Personal corpus Wikipedia dump
Period . 1.2 % 0.83 %
Comma , 1.5 % 1.02 %

Some variation can be observed, but the numbers from the personal corpus pass that sanity check. It seems a bit more punctuation is used compared to the average literature, which isn't bad to consider.

Takeaway

For the evaluation, the "theory" numbers will be the average from both sources for English, and the only source I have for other languages. But the differences with my own corpus show the sensitivity of those inputs, therefore the results should be taken with some tolerance.

As the "theory" numbers do not contain characters such as .,-'/, those will be copied from the personal corpus data then normalized.

Layout evaluation

Contained in folder layout_evaluation.

Focus definition

The influence of the physical keyboard is on the weights and penalties. The algorithm is more or less the same otherwise.

The chosen weights are for an ergonomic keyboard, so they are symmetrical. It would be similar for any non-staggered keyboard (no horizontal shift between rows).

Only the keys on the main 3×12 matrix are taken into account, which are reachable by a finger easily. The "numbers" row is ignored as we focus on the alphabetical layout.

Thumb keys available on ergonomic keyboards are ignored as I arbitrarily prefer not to place any alphanumeric character on them.

layout_matrix

The keys are designated by a code (hand, row, column). The numbering includes space for some currently-unused keys, in case of evolution (like for an Ergodox-like keyboard).

Evaluation principle

For each language, the bigram frequencies are imported from the character statistics, as a percentage of use.

The principle is to assign a difficulty (weight) to a bigram (two keys typed consecutively). The bigram weight is multiplied by its frequency, and all the results are summed up to get a general difficulty value of the layout.

Weightlayout = sum( Weightbigram × Probabilitybigram )

The bigram weight is composed of:

Weightbigram = Weightkey1 + Weightkey2 + Penaltykey1 & key2

The results for all layouts and languages are finally normalized compared to Qwerty in English (at 100%).

Key base weights

The base weights are shown below. The home row is identified by a red border.

They represent the relative difficulty to hit a single key. The proposed values are for an ergonomic keyboard, with vertical columns and a comfortable home row position.

weights

Penalties

The penalties represent the additional difficulty of hitting 2 keys consecutively. They come on top of the base weight. They are only taken into account if the 2 keys are hit by the same hand (and are not the same key, like "aa"). By default, they are given a slightly positive value in order to favor hand alternation.

Generally, the penalties are higher if the same finger is used, and the more rows separate the 2 keys. They can be negative if the relative position makes the motion easy, such as a close "inward roll" (like "sd" on Qwerty).

First finger Second finger Same row 1 row jump 2 rows jump Comment
Index Index 2.5 3.5 4.5 Same finger
Index Middle 0.5 1.0 2.0
Index Ring 0.5 0.8 1.5
Index Pinky 0.5 0.8 1.1
Middle Index -1.5 -0.5 1.5 Inward roll
Middle Middle N/A 3.5 4.5 Same finger
Middle Ring 0.5 1.0 2.0
Middle Pinky 0.5 0.8 1.5
Ring Index -1.5 -0.5 1.5 Inward roll
Ring Middle -2.0 -0.5 1.2 Inward roll
Ring Ring N/A 3.5 4.5 Same finger
Ring Pinky 1.0 1.5 2.5
Pinky Index -1.0 0.0 1.0 Inward roll
Pinky Middle -1.0 0.0 1.5 Inward roll
Pinky Ring -1.0 0.0 1.5 Inward roll
Pinky Pinky 3.0 4.0 5.5 Same finger

Limits

Bigram frequencies variations

The results are approximate as the bigram frequencies aren't a precise and objective number for everyone.

However, the results for English and French show very little difference between the "theory" values and my personal corpus. Therefore it seems the variation in bigram use doesn't affect the final grade very significantly.

Accented characters

The results for languages outside English are slightly off because most accented characters are not taken into account.

Currently, the ignored characters are êàçâîôñäöüß/å, mainly because those characters are absent from most considered layouts. The characters é and è were added manually to the layouts (on unused keys, on the vowel side if there's one) because I particularly care about French, and due to their high frequency (2.85%).

The characters ' and - were also added when missing, on unused keys.

The issue mainly affects German (äöüß, 1.56% of the characters), Swedish (äöå, 4.45%), but also French (êàçâîô, 0.75%) and Spanish (ñ, 0.22%).

To mitigate this, the bigram frequencies are normalized after removing the ignored characters, so the summed grade is still calculated over 100%.

script.py

Requirements: Python 3, Pandas, Matplotlib.

script.py uses the bigram statistics from stats.csv, and config.txt (key weights, penalties, and layouts definitions) to generate the results (table and plot).

To customize the script, edit config.txt and have a look at the main() function.

The code isn't very efficient as it iterates through dataframes to generate the results. In practice it executes in \~10s so it doesn't really matter.

Results

Here are the full results, for all languages and including my personal corpus for English and French.

results

In addition, here are the results for only English and only French. For comparison this includes the results before adding the punctuation as described earlier. The necessity of taking into account the punctuation is clear. Both the "theory" and personal corpus give very similar grades, so results are quite consistent.

Here is the complete results list. The layouts can be seen in config.txt.

Layout English English perso French French perso Spanish German Swedish
MTGAP 64.48 63.17 69.73 69.20 66.14 67.15 67.38
BEAKL 19bis 64.51 63.47 66.87 65.81 64.45 67.92 69.00
Colemak DH mod 64.54 64.18 67.32 66.75 64.18 64.70 68.16
Engram 2.0 64.59 63.32 71.17 70.28 66.52 69.11 69.47
Colemak DH 64.66 64.31 67.88 67.21 64.18 64.70 68.16
The-1 64.80 63.20 72.64 71.75 68.97 67.54 70.26
APT 64.81 63.69 70.29 70.02 66.56 64.68 65.86
MTGAP 2.0 64.91 64.44 67.21 65.77 64.01 64.98 66.77
MTGAP "ergonomic" 64.99 64.96 69.18 68.63 64.81 66.23 66.72
White 65.10 64.00 73.60 73.76 68.09 66.50 69.55
Kaehi 65.56 64.18 70.35 69.09 65.92 67.83 67.94
x1 65.63 64.87 69.76 69.23 66.28 68.03 67.91
Workman 65.83 65.51 71.42 70.57 66.85 66.93 71.31
MTGAP "standard" 65.84 65.24 68.35 67.27 64.43 66.78 66.74
Soul mod 65.89 65.53 68.96 68.00 64.71 64.38 68.37
BEAKL 19 65.98 65.00 70.36 68.28 66.12 66.99 68.85
MTGAP "shortcuts" (ROTS) 66.02 65.59 68.24 67.30 62.72 65.44 66.34
BEAKL 19 Opt French 66.32 65.23 67.12 65.96 65.85 65.31 69.32
Uciea 66.43 66.14 69.36 68.61 65.01 65.99 68.32
Oneproduct 66.44 66.07 73.48 72.44 68.07 68.45 69.22
Boo 66.54 65.87 70.78 69.42 66.04 67.03 68.68
Hands down 66.64 66.20 68.97 67.21 66.10 63.14 66.66
MTGAP "Easy" 66.78 66.44 68.63 67.15 64.55 64.97 65.35
Colemak 67.15 67.08 68.40 67.00 65.37 67.77 68.82
Niro mod 67.41 67.39 70.47 69.30 66.58 67.71 70.41
BEAKL 15 67.43 66.98 71.86 69.85 66.64 69.37 68.59
Halmak 67.94 67.45 71.84 70.78 66.87 69.60 70.68
Three 68.23 67.51 73.43 72.53 69.46 71.00 70.95
Norman 68.34 67.74 74.01 72.48 71.24 69.86 72.10
Semimak 68.85 68.87 72.62 72.63 68.21 64.74 66.94
ASSET 68.88 68.29 69.42 67.53 66.69 70.35 71.03
Notarize 69.45 68.83 70.76 68.81 67.68 69.21 67.94
Optimal digram 70.64 70.21 72.36 71.53 68.94 70.21 70.11
qgmlwyfub 70.83 70.39 75.87 76.11 70.24 72.16 73.71
Optimot 70.98 70.79 65.24 63.64 65.56 70.85 72.55
Carpalx 71.02 70.70 76.25 76.62 70.86 74.00 74.53
Qwpr 71.69 71.30 73.36 72.93 69.44 73.07 71.69
Coeur 72.29 72.46 67.35 65.66 67.07 71.04 73.48
Bépo keyberon 72.79 72.71 68.15 66.78 68.53 72.07 73.49
Minimak-8key 72.94 72.37 74.81 73.48 71.94 75.20 73.55
Bépo 40% 73.20 73.19 67.97 66.61 68.54 73.10 73.56
Dvorak 73.74 72.79 78.15 76.07 75.34 74.59 78.53
Neo 76.31 75.50 76.37 75.37 74.56 71.62 74.16
Qwertz 98.56 97.63 98.12 97.59 93.66 98.31 95.30
Qwerty 100.00 99.17 98.90 98.64 92.45 99.74 96.08
Azerty 105.44 104.64 104.18 103.88 102.40 102.81 102.72

As the "Colemak DH" layout gave the most interesting results, a personal modified version was added to replace some characters like ; (that can be on a layer like Shift + ,) by the French é.

Conclusion

All the alternative layouts perform very significantly better than the traditional ones (Qwerty and similar). The biggest interest for ergonomics is using an alternative layout at all, the selected choice matters a lot less.

results

The options that perform the best are some Colemak variants like DH (image below), the obscure BEAKL 19bis, the Engram, as well as the layouts generated by Mathematical multicore (MTGAP).

Colemak DH

In conclusion, the Colemak DH layout is particularly recommended for its good results and better shortcuts access, familiarity with Qwerty, and larger positive user feedback.