Helium314 / HeliBoard

Customizable and privacy-conscious open-source keyboard
Apache License 2.0
2.3k stars 89 forks source link

Add priority popup keys to it, pms and tr #824

Closed glemco closed 2 months ago

glemco commented 4 months ago

This PR adds the priority popup characters for Italian, Piedmontese and Turkish based on what character with diacritics are used in the language.

The following screenshots show it in action: by disabling all indicators besides the language (priority) ones, we can see that the desired characters are correctly displayed as overlay.

Perhaps stupid question, but while playing with the files I tried to change the extra_keys, thinking it was related, apparently it isn't. What is that section for?

Helium314 commented 4 months ago

Extra keys are the keys that are added in the default layout when it ends in +. I did this because there are a lot of languages that use an existing qwerty/qwertz/azerty layout and add keys on the right side. These keys on the right side are the extra keys (see layouts.md).

I am currently revising the locale key texts, so I would like to avoid PRs on this until #659 is done (title is misleading, it's about removing all diacritics not specific to this locale). Do you think the priority keys should be still defined in the locale key texts after #659 is done? I consider removing the possibility of using % for this, and instead declare all keys in locale key texts as priority.


Btw I didn't find a good source which diacritics are necessary for Piedmontese. My usual source only has a relatively short 30k sentences list from Wikipedia. From your priority list, I would guess the letters to add are à, è, é, ë, ì, ö, ò, ü, ù. But I checked the use count, and ü and ö only have 95 resp. 85 uses in the source list, while the not listed ó has 371.

What would you recommend here? Wikipedia is not really a great source for this sort of estimation, e.g. ė has 61 uses because there are many mentions of Lithuanian cities.

glemco commented 4 months ago

Mmh sure, makes sense, I'm not sure what would be the consequence of removing the % logic, would that imply all diacritics ever reachable via popup are going to be priority? My main concern while opening this PR was that I would like to have the diacritics as first suggestion for popups in the above languages, I did by enabling the Language source (since Language priority was empty without the changes in this PR) but that enabled them also for English (which has no diacritic but one may want to type letters like ñ or ø every now and then). Now I found the Language (priority) powerful because it tells which diacritic is definitely needed, but the layout may allow also to type others. I don't fully get if your proposed change would still allow this.


Regarding Piedmontese, the story is a little complicated, it's a language that is mostly spoken and many speakers don't know exactly how to write. The standard writing system is kinda closer to French, this is the one strictly used by Wikipedia. As an attempt to make it easier for Italian speakers (the vast majority of Piedmontese speakers) a foundation created a different writing system (using among others letters like ö and ü in place of eu and u).

Now that foundation's website is about the most advanced Piedmontese source you can find on the web, I asked them their dataset and it seems also Microsoft did the same for Swiftkey. To create the dictionary, I combined their dataset and just about everything I could find on their website, converting as much as I could to their writing system (with their permission and license, of course). Now, it's still likely the dictionary is a little inconsistent (not everything can be inferred that easily).

That said, the letters that I put as priority are the ones mostly used in that writing system, but I left also others (ó for instance, but I might have picked the wrong direction of the accent, my bad) belonging to the other, just in case.

Considering the missing standardisation of the language writing, I don't think it's a big deal, but it's probably good if I check again the dataset. I would say that ö and ü should have the priority over the others appearing on o and u. But don't take my word for now, I should adjust the dictionary and check more consistently. Regarding e, we have several (I'd say é being the most common) but does it make a difference between the second and third more common or is the keyboard going to show only the first?

That said, we can close this PR or just keep it open for me to leave more details after I check the dictionary, I assume all changes here will not be needed the way they are anyway.

Helium314 commented 4 months ago

would that imply all diacritics ever reachable via popup are going to be priority

The diacritics for the enabled locale(s) would be priority, others added via the show more letters with diacritics setting would be non-priority. I think that matches what you want?

Main changes compared to your proposed priority keys is that for tr, î and û would also be priority keys.

but I might have picked the wrong direction of the accent

Certainly not, ò has 17k uses vs the 371 of ó. I just wanted to mention it because it has higher usage than ü and ö. But with your explanation I see that it does not really matter. Your proposed priority keys for Piedmontese should be fine then, I'm not qualified to comment on this anyway.

does it make a difference between the second and third more common or is the keyboard going to show only the first?

The first is going to be shown as hint, otherwise it does not matter. All letters are shown in the popup.

That said, we can close this PR or just keep it open for me to leave more details after I check the dictionary, I assume all changes here will not be needed the way they are anyway.

I'd like to keep it open as a reminder, at least until I update #659

glemco commented 4 months ago

The diacritics for the enabled locale(s) would be priority, others added via the show more letters with diacritics setting would be non-priority. I think that matches what you want?

Yeah seems good actually, thanks!

Main changes compared to your proposed priority keys is that for tr, î and û would also be priority keys.

Well, I'm no native speaker and never met those two (also â is rather rare), but indeed they seem to exist..

Your proposed priority keys for Piedmontese should be fine then, I'm not qualified to comment on this anyway.

For now I'd say yes, but I'd definitely need to clean it up a little, I made the dictionary a few years ago and didn't bother much. In case i see it doesn't fit I can submit another PR after you refactor the whole thing.

Anyway, thanks for your quick support and keep it up!

glemco commented 3 months ago

I checked again the pms dictionary and the writing guidelines and it seems it's relatively accurate, according to the dictionary, those are the occurrences for the diacritics: ü has 11754 occurrences ì has 2912 occurrences à has 2369 occurrences ö has 1501 occurrences ë has 1469 occurrences é has 1416 occurrences ò has 259 occurrences è has 49 occurrences ù has 9 occurrences

So I'd keep the order I mentioned before (regarding e, it doesn't change much but é feels more familiar and I'd put it first): a -> à e -> é ë è i -> ì o -> ö ò u -> ü ù

Helium314 commented 3 months ago

Great, this fits with the preliminary list I have in #659 (https://github.com/Helium314/HeliBoard/pull/659/commits/61eed812a127bb57dc15b4851bb321885f81aab5)

glemco commented 2 months ago

Closing since https://github.com/Helium314/HeliBoard/pull/659 was merged