Open leogama opened 1 year ago
@binarybottle, feel free to move this issue to binarybottle/engram if you think it's appropriate (there would be more people to discuss).
@leogama -- Thank you very much for the suggestion! I like the idea of a Latin-script, Esperanto-esque optimized key layout. I'm not sure about calling it "Latin" because it wouldn't be optimized for the Latin language, but other possible names include "esp" (for "Esperanto", reading minds, or "especial"), "indoeuro" or "ie", or "ESF" from "English-Spanish-French".
For such an undertaking, we would need a representative corpus combining all three languages, and I know of no one better suited to the challenge than @iandoug. Ian -- what do you think???
FWIW, I have been busy for a while with a similar project. It started with a desire to support our 11 official languages, but we have so many other people from north of us here as well, as wall as moves to introduce Swahili as a language in schools, that I thought I might as well look at the wider Southern-African region.
The relevance here is that Afrikaans already uses a lot of diacritics. We have German speakers in Namibia, Portuguese in Angola and Mozambique, and one country speaking Spanish.
Afrikaans is close to Dutch. West Africa is French.
So I added all the diacritics to my Poqtea layout, and tested it against a small corpus with all the languages (using the Universal Declaration of Human Rights available from Unicode).
Poqtea does well with all of the languages except one, where it is average (rather than good or terrible).
But the corpus is not very good, and needs to be bigger.
I have already collected the files for various languages from Uni Leipzig site, I need to clean them and get the character and bigram frequencies.
Let me see what I have in that regard.
Attached layout is current version of design. It allows typing in a multitude of Southern African languages, as well as (at least) the following Euro languages:
English, (Afrikaans), Portuguese, Spanish, French, Italian, German, Dutch, Turkish.
The design supports dead keys via the "compose" function, but diacritic letters for these languages are also directly typeable via the Blue (a renamed AltGr) and Green (new) modifiers.
The main character area is 48 keys as per ISO boards, so they could be put on ISO. Just need to repurpose some on the useless Windows keys to be Compose and Green.
I realise this is not Engram, but it provides something to measure against.
For Poster's needs, we can remove the diacritics used in the African languages like ṋ, ṱ, š, etc.
French project:
https://en.wikipedia.org/wiki/AZERTY#/media/File:Azerty_NFZ71-300.png
https://norme-azerty.fr/en/
Scroll down to Documents section it bottom to see how they want to handle Greek, currencies, etc. They included Bitcoin, which I don't think will survive the rise of Central Bank digital currencies (they will simply outlaw all non-official ones), but missed the Thai Baht.
German project: https://en.wikipedia.org/wiki/German_keyboard_layout https://de.wikipedia.org/wiki/DIN_2137
But these are basically tweaks to AZERTY and QWERTZ.
US International: https://en.wikipedia.org/wiki/QWERTY#US-international
Proposal for Italian: https://www.farah.cl/DistribucionesDeTeclado/NuovItal/index_en.html
ADNW for German and more. http://www.adnw.de/index.php?n=Main.HomePage
Letters: My research so far shows that you would need to support this:
Latin alphabet A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z
Indo-Arabic digits 0 1 2 3 4 5 6 7 8 9
Other general punctuation and symbols ! @ # & * ^ % ( ) { } [ ] - _ / ? \ | ' " , < . > ; : = + ~ `
English á à ç é è ê ë ï ñ ô ö æ œ Á À Ç É È Ê Ë Ï Ñ Ô Ö Æ Œ
Portuguese á â ã à ç é ê è í ì ï ó ô õ ò ú ù ü Á Â Ã À Ç É Ê È Í Ì Ï Ó Ô Õ Ò Ú Ù Ü
French é à è ù â ê î ô û ë ï ü ÿ ç É À È Ù Â Ê Î Ô Û Ë Ï Ü Ÿ Ç
German ä ö ü ß Ä Ö Ü ẞ
Italian é ó à è ì ò ù î É Ó À È Ì Ò Ù Î
Turkish ç ğ ı ş Ç Ğ İ Ş
Dutch á â ä é è ê ë í ï ó ô ö ú û ü ij ȷ Á Â Ä É È Ê Ë Í Ï Ó Ô Ö Ú Û Ü IJ
Other characters which appear on various European keyboards: ñ Ñ ç Ç £ € ‹ › « » ª º § ¿ ¡ ¬ ° μ ¤
Assorted combining diacritics.
So it's a lot.
@iandoug -- Amazing work!!! Thank you for sharing your progress.
Re (1), Africa is huge with a multitude of languages, not all of which use the Latin alphabet.
Nigeria tried to create a pan-Nigerian layout: https://en.wikipedia.org/wiki/Pan-Nigerian_alphabet
Basically hacked QWERTY, not optimised at all.
A few years back I worked with Hugh from SIL on two projects, one of which was a keyboard for one African language. They use tone marks etc, which are more frequent than most letters.
https://github.com/HughP/dnj-corpus/issues/20 https://en.wikipedia.org/wiki/Dan_language
And that's just one language. Pan-African support as not viable, which is why I limited myself to "most" Southern African (basically from around equator south) and which use Latin alphabet.
Down here, there is another twist, Bantu and KhoiSan languages use an assortment of click sounds, which not on the keyboard (they are in IPA). https://en.wikipedia.org/wiki/Khoisan_languages
If you go north you hit Arabic. Go east, hit Ethiopic. https://en.wikipedia.org/wiki/Tigre_language
So Pan-African non-trivial.
Re (2), Yes, need to make wider one for my own needs. Will weight the parts by number of L1 and L2 speakers.
Re (3), Will make intersection.
Will also scan font collection to see what font support is like.
Cheers, Ian
BTW did we do something with Russian? Can't find link but I did get corpus, and recently discovered some errors (handling unicode properly..)
Might even be similar issues with the Spanish and Polish. Will relook at it.
@iandoug --
German diacritics are few compared to Dutch/French/Portuguese.
I had an comment to me on Reddit that EURkeys mentioned above is Western Europe rather than Eastern/Northern Europe (those using Latin script).
That may suggest a logical split. The new German layouts are attempting to cater to the languages around them, including things like Sami.
There are so many diacritics though. Even adding them all as dead keys, and labelling the caps, leads to a cluttered keyboard. I know my Janiso layout is also very cluttered, that's partly because I was trying to avoid dead keys (or Linux Combine method) and instead place all the needed diacritic versions on a layer.
Even out current US/UK layouts do not support English fully. Adding the missing bits á à ç é è ê ë ï ñ ô ö æ œ Á À Ç É È Ê Ë Ï Ñ Ô Ö Æ Œ
takes you a long way to supporting French, Dutch, Spanish and German. Portuguese is another level up.
Am building a database query tool to see what as needed for the different languages.
Thank you, @iandoug -- I appreciate your building a query tool to distinguish between the languages!
Diacritics on a dedicated layer makes sense to me to avoid conflicts and clutter.
From the diacriticals you shared, it looks like we would need the following to support every (?) W European language:
If two dedicated keys were used in conjunction with A, E, I, O, N, and two arbitrary letters (for æ and œ), then they could cover all of the above except for e/E umlaut and e/E circumflex.
Bepo.
Personally, it may be too complicated. According to English WP, both this and revised Azerty have been accepted by French standards authorities, but have had little uptake.
I think manufacturers are reluctant to retool. Or the need for new drivers for Windoze is holding them back.
linked from https://bepo.fr/wiki/Accueil
then Googlised.
On related issue, have you got your layouts using magic diacritic key working in practice?
The truth is somewhere between what these guys:
http://diacritics.typo.cz/index.php?id=49
and these guys:
https://hyperglot.rosettatype.com/
and the various WP pages (eg, "spanish alphabet") say.
I've posted some corrections to Hyperglot for Afrikaans, so other languages may also have issues.
Better than fixing an international design that's equilibrated between several languages, but hardly excels at any, I'd suggest we shift our focus towards making some kind of software that spits the optimized layout based on the user's 1 or 2 languages he is going to be typing most of the day. So for example, I write 80% of my day in English and 20% in Portuguese. If I could insert those 2 languages and their weights on a keyboard analyzer using the Engram math/logic, it would give me the perfect layout for my use case, whilst for another user it could be a different layout at 10% English and 90% Portuguese, etc.
Ok, after much effort I put together an adaptation of the English Engram that has all the diacritical accents needed to by Portuguese speakers (while still being optimized for the English language). Here is the keylayout file (for Mac OS): https://drive.google.com/file/d/1DBLHpBnFlDoDfmZ38-y7qsPiZweY2yFM/view?usp=sharing
Hello, there. I'm an enthusiast user of the (programmer's) Dvorak layout for almost a decade now, and it was a huge improvement over good ol' QWERTY to learn it. However, while it is really widespread and readily available on most current systems, its performance for the English language is sub-optimal. Also, its variations for languages with similar alphabets —like my dear Portuguese— are still "super-terrible" (a bit less terrible than QWERTY due to the vowels at the left home row).
The elephant in the room
I took a look at some of these newer designs, including yours. Congratulations, by the way! Amazing work. But the OP touched a very important point that is still unaddressed by all of these: we live in an international, interconnected world now. Until the early 2000's, it wasn't a problem to have totally different keyboard layouts for every language. We even used different, incompatible text encodings! But now the most used encoding in both new devices and the Internet is Unicode. I believe the same transition should happen to keyboard layouts.
But is there a need for it? Well, most professionals that type a lot (journalists, academics, programmers, etc.) will need to either create content in more than one language, usually in their native one and in English, or at least communicate with foreigners through text often. It applies even to countries that have English as their primary language, like the US, where there's more and more people speaking Spanish as a primary or secondary language each year (> 50 million today).
Is an "international" keyboard layout possible?
I know that many languages use completely different alphabets and, even when they use similar ones (like variations of the Latin or Cyrillic scripts), they have extra characters and wildly varying letter/n-gram frequencies. Therefore, there can't be a truly international base layout for keyboards. But can we do better?
Starting from English, the de facto international language, a non-monolingual layout can't be much distant from ASCII. Looking at the languages with most speakers in the world that use a Latin script alphabet, we have in the top positions (Wikipedia/Ethnologue 2022):
I think it would be feasible to analyse these 5 languages, from two branches of the same language family —you already did it for two of them― and find a design that is awesome for one (likely English) but doesn't sucks for all the others.
A "Latin" or "Romance-Germanic" base keyboard layout
For whoever is interested, I propose the development of a base layout using the Latin alphabet that is optimized for all of these 5 languages. It wouldn't be a simple weighted optimization though. What I would expect to achieve with this design is:
Steps necessary to achieve these goals:
Advantages
I'm seriously considering to learn once more a new keyboard layout, but it would have to be a killer layout. It would have to be one to rule them all.
I am willing to dedicate some time to this idea if there are others interested. If not, maybe I'll end up trying to create my own Portuguese or Portuguese-English Engram layout.
Greetings from Brazil! 🇧🇷
Originally posted by @leogama in https://github.com/binarybottle/engram-es/issues/40#issuecomment-1526811911