HughP / tcf-james-bigramed

Bigram Counting for tcf james
0 stars 0 forks source link

Mephaa keyboard analysis

Language: [tcf]

Text source: Mark Weathers

Text content: Epistle of James

Keyboard layout history: The Spanish Windows ANSI keyboard layout was co-opted with redundant Spanish characters replaced by Meꞌphaa characters. This was done in conjunction with replacing the glyphs in a special font, so that no keyboard setting needed to be changed on the computer. Simply type as normal on a Spanish keyboard and use the Mephaa file. This results in a keyboard layout that has been used for some years by some members of the Meꞌphaa community. The layout for this keyboard is presented below and is available as JSON from here.

Meꞌphaa keyboard layout images:


Base State

Base State


Base State

Shift State


Base State

Alt State


Required Usage

Mephaa Required Usage


Text processing steps:

  1. Text received, as JAS_TCF.txt

  2. Moved characters from hacked font code points to proper Unicode values, using Teckit. me'phaa.map & me'phaa.tec

  3. Scripture texts has a very formal typesetting process. Things like paragraph, book, chapter, and verse markers: these are indicated by a reverse solidus \. All of these are removed (by hand).

  4. Replaced all characters in the Meꞌphaa text with their corresponding values as if they were English characters typed on a QWERTY keyboard. (Done by hand via search and replace.) resulting file: tcf-on-QWERTY.txt This allows for Typing to process the characters (really in the mental model of typing it is processing keypresses not characters). Typing only processes characters as if they are single byte, so no two or three byte characters work with the program. However, this means that if a language corpus is converted from their orthographical representation it can be re-rendered as a keypress representation. This keypress representation can just so happen to have QWERTY codepoints - the result is not English, rather some language as goblety gook. Another way to think about this would be to use ISO 9995 names of keys.

  5. tcf-on-QWERTY-UCC.txt is a quick check to show that all characters in the file are in the single byte range.

  6. Typing requires a list of character bigrams and a list of character counts. The default method is to use an application by Michael Dickens called Frequency. - Hugh has had some difficulty in getting that to compile (and it was not guaranteed to work with multi-byte characters which was another requirement). So in lieu of using that Hugh started down the path of step Seven. Typing assumes that there is a one to one correspondence between each single byte character and each keystroke. Processes in step three ensure that all all multi-byte characters are converted to single byte characters and their corresponding positions. This can allow Typing to give us a fitness value (by running the tests against the existing QWERTY setting), it can also allow Typing to make a projection about how to organize a keyboard layout based on Typing's simulated annealing algorithm.

  7. To create bigrams and character count the following scripts were used:

    ./bigrams.py tcf-on-QWERTY.txt > allDigrams.txt

    Then to get the character counts.

    UnicodeCCount.pl -n tcf-on-QWERTY.txt | cut  -f 2,3 | tr "\t" " " > all
    Characters.txt && sed -i '1d' allCharacters.txt

    Then the character for new line had to be added to the top line as \n.

  8. Eventually just KLA was used with the text from tcf-on-QWERTY.txt. This text was then re-encoded back to Meꞌphaa and an image created via KLE. The analysis of KLA is presented below.

  9. Assumed total keypress count for Meꞌphaa was computed as follows:

    UnicodeCCount.pl tcf-on-QWERTY.txt | tail -n +2 | cut -f 3 | paste -sd+ | bc

    This results in a total of 22235 keypresses. It is assumed that because we are counting the text after it has been converted to QWERTY that we are no longer counting characters, but we are counting what they represent, keypresses. By using the following command

    UnicodeCCount.pl mephaa3-unicode.txt  | tail -n +2 | cut -f 3 | paste -sd+ | bc

    we see that there are only 21294 characters (NFC) in the Meꞌphaa text. If then take out the 1879 units of U+0331 (it is a combining character) then we get the total number of "reading characters" (like letters, but without evoking the idea of functional units or punctuation and non-visible characters - I'm counting diacritics with their bases - and I am not counting ñ as a separable character). For a grand total of 19415 letters. 22235 key presses to produce 19415 letters. A ratio of 1.145 keys per letter. 4176 total diacritics, for a diacritic (to character) density 21.51%. Or of one diacritic per every 4.2428 letters (not including ñ). If we include ñ the total increases to 4296 and the density to 22.12% or one diacritic per every 4.51 "letters".

Keyboard Layout Analysis

Statistical analysis of exiting and optimized Meꞌphaa keyboard using Keyboard Layout Analyzer (KLA)

Using the text transformation methods outlined above the following keyboard statistics become available when using KLA.

KLA also suggests an "optimized" keyboard, and is the reference keyboard layout in the following graphs. This is contrasted with the existing Meꞌphaa keyboard which is shown above. In the diagrams the KLA optimized keyboard is referenced as Personalized while the existing Meꞌphaa layout is labeled QWERTY. It should be noted that the KLA optimization engine acknowledges that it is not a very aggressive optimization. One place or issue that Hugh notices is where further optimization could be considered is that both tone marks could be moved to the right hand so that a better cadence can be achieved. As the tone marks currently are situated a high level of outward rolls exist.

proposed optimized keyboard

The distance that the typists' fingers will need to travel is greater for the exiting Meꞌphaa layout.

Mephaa distance traveled

As the previous heat map for the existing Meꞌphaa keyboard shows, the frequently used keys are on the periphery of the typing area, significantly overloading the weaker fingers. In actual typing of Meꞌphaa, Hugh has observed this to contribute to hunt-and-peck style typing.

However, the severity of how much more work to type it is is not revealed until we compare the text input task on the Meꞌphaa keyboard with other keyboards. There are two ways to compare:

  1. Compare within the Meꞌphaa language to other keyboard layouts supporting mephaa
  2. Compare to other language based options that Meꞌphaa speakers might use to communicate the same message. The following graphs illustrate both points.

Meꞌphaa distance traveled

Meꞌphaa distance traveled charts

In terms of work load percentages we can see where on the hand the two keyboards are "balancing" the workload.

Meꞌphaa distance traveled

Finally in terms of row usage we can see where the high frequency targets are.

Meꞌphaa distance traveled


Handedness on keyboards