Helium314 / HeliBoard

Customizable and privacy-conscious open-source keyboard
Apache License 2.0
2.33k stars 92 forks source link

Remove fully capitalized words? #780

Open vinoff opened 5 months ago

vinoff commented 5 months ago

Spellchecker is correcting words such as "btw" to "BTW". This is a bit infuriating as I keep having to press backspace in order to uncapitalize. No one writes "BTW", or "AFK". People write it lower-case.

"I will be there a bit later today BTW, I will be going AFK now. See ya." Just looks wrong. "I will be there a bit later btw, I will be doing afk now. See ya." is how one would type it.

Is this already possible to do and I am just missing the option in the settings?

Helium314 commented 5 months ago

You can long-press the suggestion, then a delete icon appears that will add the selected word to a blacklist.

vinoff commented 5 months ago

You can long-press the suggestion, then a delete icon appears that will add the selected word to a blacklist.

I still want the suggestion though... And there are so many words for which this happens.. It would be nice to have an option in the settings "Disable uppercase in abbreviations".

Helium314 commented 5 months ago

If you want the suggestions, but not the autocorrect, you can either try reducing auto-correction confidence, or just correct it manually to btw, usually after the first or second time the app should remember your choice (requires personalized suggestions of course).

vinoff commented 5 months ago

If you want the suggestions, but not the autocorrect, you can either try reducing auto-correction confidence, or just correct it manually to btw, usually after the first or second time the app should remember your choice (requires personalized suggestions of course).

I still think it would be nice to have an option in the settings. Regardless, I don't think that manually correcting it is working as intended?

Here are my findings:

What am I doing wrong? How am I supposed to manually correct it? Can you please give it a try?

ghost commented 5 months ago

I've been working on new dictionaries but I don't have a computer to compile them into binary — the current dictionaries are very bad for many reasons.

See here

And my repository here

There are lots of fundamental design flaws in how dictionaries are being created and utilized in android keyboards.

flashymittens commented 5 months ago

No one writes "BTW", or "AFK".

I do 🤷

Helium314 commented 4 months ago

It is still suggesting BTW as the first option.

If it's just a suggestion but not autocorrect, what is the problem? You already typed the word you want. By default the typed word is not shown in the suggestion strip, unless it would be autocorrected to something else.

vinoff commented 4 months ago

It is still suggesting BTW as the first option.

If it's just a suggestion but not autocorrect, what is the problem? You already typed the word you want. By default the typed word is not shown in the suggestion strip, unless it would be autocorrected to something else.

When I press space, it will autocorrect to BTW. I will then have to use backspace or manually change, which is annoying.

Helium314 commented 4 months ago

Interesting, for me it doesn't autocorrect any more after I typed btw 3 times.

knutid commented 4 months ago

BTW is not a "capitalized word" it's an acronym, a type of abbreviation. The most common capitalization scheme seen with acronyms is all-uppercase, ref. Wikipedia - Acronym.

My theory is that people incorrectly write acronyms in lower case because they're lazy. :grinning:

So, IMHO it's working correctly.

Lppsoeht commented 4 months ago

This happens with other languages as well. One example, I'm trying to type "Ma"(but in Italian) as the first word of a sentence, and it autocorrets to MA, which I frankly do not know the meaning. Maybe it's an english acronym... I'm using Italian+english+spanish keyboard. It doesn't learn from the numerous times I tried to digit "Ma", I have personalised suggestions on.

Commenter25 commented 3 months ago

This happens very frequently to me, because I will write casually entirely in lowercase, but sometimes in all caps when excited. This causes it to save words separately in all caps, and occasionally try to switch to them in my normal tone.

The most annoying example is capitalization of the letter I. I specifically choose to make it lowercase in many cases. But it saves the uppercase variant from when I go all caps, and then tries to override it. No matter how many times I delete the suggestion, or attempt to add overrides to prevent this, it continues to happen.

This is a frequent annoyance I have to manually fix which slows down my typing. This is not a problem exclusive to me, other people I've spoken to encounter the same problem. Honestly, this is likely an issue for most people of my generation.

In my opinion, it makes no sense to save both uppercase and lowercase variations of the same word. The dictionary should be mostly case-insensitive, especially since there is already a system to make suggestions all caps when caps lock is enabled.

(yes, i'm writing more properly here, being on github mentally puts me in formal writing mode for some reason)

Helium314 commented 3 months ago

A simple solution would be using a custom dictionary. There you can remove capizalized words/shortcuts, and you can modify the entry for word=i so it doesn't try correcting to I any more.

Making the dictionary case insensitive is not something I will to do, because there simply are words starting with an uppercase letter.

Commenter25 commented 3 months ago

Not fully case insensitive, I mean that it should not consider fully lowercase and fully uppercase identical words to be different. So like, ball and BALL should be the same word.

nanopone commented 3 months ago

the behaviour that heliboard has does not match many popular soft keyboards (namely google keyboard and the ios keyboard), so i feel like there must be a compromise that accommodates all typing styles. i personally never had an issue with acronyms on google keyboard and don't generally see other people typing them in all caps, so it seems to be extremely uncommon for soft keyboards to do this, and i think heliboard should be the same

Helium314 commented 3 months ago

So like, ball and BALL should be the same word.

I disagree. It might be true for your usage, but other users use uppercase acronyms that should not be considered the same as lowercase words.

and i think heliboard should be the same

See my comment above. You may not like it, but using a custom dictionary will work for you. The ability to use any dictionary you like is the compromise you want, as far as I see.

Commenter25 commented 3 months ago

I don't think expecting users to make a custom dictionary is an intuitive workaround, and would be a lot of duplicated effort for people who speak in lowercase.

Lppsoeht commented 3 months ago

What kind of compromise is this? See any other modern keyboard. Keyboard has to learn from user input, correctly. If it doesn't at least it should give a better way to include new preferred words while typing, not via a separated dictionary.

Yayroos commented 3 months ago

Language changes, and if your tools for writing it don't accommodate that then they're causing unnecessary friction and frustration. Most young people on the internet (and many older people who engage with youth and pop culture) use lowercase acronyms for things like 'btw' 'fyi' 'afk' 'lol' and so on. This has been the case for years, and will continue to be the case. As others in the thread have stated, using them uppercase in the middle of a sentence feels wrong, and that's the number one indicator to a fluent speaker that they're using something in a way that doesn't mean exactly what they want it to mean. Having that wrong feeling means that the usage has changed, and the two aren't exactly the same, so choosing between them is an act of sentence construction, as much as any other word choice is.

BTW and btw and by the way all function subtly differently in a sentence. I read btw as being casual, relaxed, and adding some extra information. I would read BTW much more in the realm of an 'um, actually' almost passive aggressive correction, the 'as per my last email' of letting someone know there's extra information to come, and the full by the way very much neutral between those two points. My interpretations may not be universal, but they also aren't unrelated to how other people use them. I've learned this stuff alongside the rest of the users of the modern internet, in the places where casual language happens.

People complain about language changing because they can't keep up, but for the most part they can't keep up because they already dismissed the new words and grammars and syntaxes as being nonsense and refused to engage, because they're being used by young people, and queer people, and POC, on the internet, and so those uses of language are inherently silly or wrong or just kid stuff.

And, what people call laziness in language is efficiency, of the kind we all naturally chase. I guarantee that when you speak out loud, you do not carefully pronounce every single syllable of every single word, because nobody does, because that's how speech works, we conserve energy and save time and give only the information needed to convey the thing we are trying to convey. Written text works the same, we drop letters and create acronyms, because typing three keys is a lot faster than typing 10, and then we get to the age of mobile phones and capitalising stuff is kinda a pain, and actually no information is lost when you go lowercase because 'btw' is not easily confused with any other word, so you drop the capitalisation, and then when you really want to reinforce a point, you bring it back to serve a purpose, you're putting in effort to emphasise the word, like you would when you capitalise any other word in ordinary text.

Not everyone uses acronyms this way, sure, but a lot of people do, and that number will only increase as more and more young people get online. All the major software keyboards and autocorrect schemes for phones accommodate this, because a keyboard is a tool for expressing language and if your keyboard can't express language the way people want to, then it may as well be missing a letter.

ghost commented 3 months ago

Not to get into any of this debate about language...

A big frustration I've had with most Android keyboards is the dictionary — specifically the excessive size that leads to bad gesture typing, bad word suggestions, and bad text prediction.

I wrote about it in the discussions — how to improve dictionaries, toolbar, and voice input.

Fundamentally keyboards are saddled with a static binary dictionary — words cannot be deleted, you can't choose the size, etc. This has been fundamental in all soft keyboards — HeliBoard is one of the few keyboards you can use without an installed binary dictionary though there are issues with that as well.

To the point: this design requires maintenance from developers to maintain binary dictionaries and unnecessarily complicated word lists, IMO. Which provide no long term benefit to the user and only headaches to the maintainer.

Moving to, similar to TT9 keyboard, a text based word list as dictionary that users can import — thereby the user can choose whatever lists they want — and words would be deletable, easily synced, etc. Putting an end to all these types of issues 🤷🏼‍♂️

Now back to arguing about language ...

Yayroos commented 3 months ago

As a side comment, on wether or not 'i' should be capitalised, in many sans-serif fonts popular on the web a capital I actually risks losing information, and introducing confusion between i and l, so the tendency towards lowercase i makes even more sense than lowercase acronyms (which the more i think of it aren't really acronyms at all, since the -nym means name, aka nouns, and things like 'btw' or 'lol' aren't nouns at all, 'lol' is a verb (i would propose the term acroverb if i didn't think someone much more qualified at linguistics had probably already had this idea and given it a better name) - acronyms for nouns are much more likely to stick as uppercase, eg the banks NAB and ANZ, compared to the acroverbs lol, lmao, rofl and so on, and the other structures i wont try and name right now because i'd need to go figure out a bunch of names for different kinds of things we shorten like this and i have errands to run today so I cant spend it all going down linguistic rabbit holes.)

vinoff commented 3 months ago

Damn, my issue sure derailed.

The original issue was to implement a function to uncapitalize acronyms. This is especially useful for people who write in 2 languages. It is often the case that one abbreviation in 1 language is an acronym in another, which leads to annoyances. This would, of course, be a toggleable option. It goes without saying that I do not want to force this behaviour on anyone.

Writing fully in lowercase is not in the scope of this issue, not to me at least.

If you want the suggestions, but not the autocorrect, you can either try reducing auto-correction confidence, or just correct it manually to btw, usually after the first or second time the app should remember your choice (requires personalized suggestions of course).

Also, this is not happening to me. Dictionary seems to not be learning. Can anyone please confirm if it does (or not) work for them?

vinoff commented 3 months ago

This happens with other languages as well. One example, I'm trying to type "Ma"(but in Italian) as the first word of a sentence, and it autocorrets to MA, which I frankly do not know the meaning. Maybe it's an english acronym... I'm using Italian+english+spanish keyboard. It doesn't learn from the numerous times I tried to digit "Ma", I have personalised suggestions on.

This is exactly my issue is as well. This upper-case behaviour for acronyms is specially infuriating for people who write in more than 1 language.

vinoff commented 3 months ago

I just did the following:

So ya, this keyboard seems broken to me. Am I the only one, really?

ghost commented 3 months ago

If you have learn words enabled you have to "use" the words at least 3 times.

  1. Type the word and select it in suggestion, add a space
  2. Tap on that same word and select it in the suggestion and add space again
  3. Tap that word again — at this point the word should be bold in suggestion view — select it again in suggestion.
  4. Now check dictionary and it should be there.

IMO this is not a good way for adding words, and "delete" does not delete, it only blacklists the word which will be enabled once typed which, I'm not sure but, typing the lower case may enable the upper case.

I personally don't use HeliBoard with any installed dictionary and use User Dictionary Manager (UDM) to add words manually to the personal dictionary, and actually I'm using Gboard more now because the typing has improved drastically compared to HeliBoard. The whole dictionary area for AOSP keyboards is poorly designed IMO, Google did no one any favors there.

vinoff commented 3 months ago

If you have learn words enabled you have to "use" the words at least 3 times.

1. Type the word and select it in suggestion, add a space

2. Tap on that same word and select it in the suggestion and add space again

3. Tap that word again — at this point the word should be bold in suggestion view — select it again in suggestion.

4. Now check dictionary and it should be there.

IMO this is not a good way for adding words, and "delete" does not delete, it only blacklists the word which will be enabled once typed which, I'm not sure but, typing the lower case may enable the upper case.

I personally don't use HeliBoard with any installed dictionary and use User Dictionary Manager (UDM) to add words manually to the personal dictionary, and actually I'm using Gboard more now because the typing has improved drastically compared to HeliBoard. The whole dictionary area for AOSP keyboards is poorly designed IMO, Google did no one any favors there.

Ya, this is not working for me. I tested this with the word "btw".

Opened a message to myself on whatsapp and did as you told:

Whenever I type "btw" and add a space, it still corrects to "BTW".

This is simply not working for me. Is there maybe something special with the "BTW" word? Can you please try it yourself?

ghost commented 3 months ago

This is simply not working for me. Is there maybe something special with the "BTW" word? Can you please try it yourself?

Does the word ever become bold in the suggestion view? If not then try a clean install.

Also, I think, you have to have enabled

  1. Personalize suggestions
  2. Add words to personal dictionary
vinoff commented 3 months ago

This is simply not working for me. Is there maybe something special with the "BTW" word? Can you please try it yourself?

Does the word ever become bold in the suggestion view? If not then try a clean install.

Also, I think, you have to have enabled

1. Personalize suggestions

2. Add words to personal dictionary

That 2. point seemed to help. I managed to make it suggest "Btw". Interestingly, I was sending "Btw " to myself, note the uppercase of the first letter. This is due to being the first word on the message. I am now trying with "fdiasjf btw ", to see if it starts suggesting "btw" instead of "Btw", since it is in the middle of a sentence, but no, it keeps suggesting "Btw", even though I have definitely send more "btw" than "Btw". There is definitely something wrong going on, but I guess I will just restart and do "btw" from the start.

Thank you for helping.

Helium314 commented 3 months ago

I haven't read all of this, sorry. What really bothers me about this dictionary discussion is that it's quite simple to create and import your own dictionary (which is likely better than the ~10 year old default AOSP dictionary anyway). And yet people would rather spend their time on discussion about how they are bothered by the default dictionary, instead of taking the original wordlist, kick out the "bad uppercase words" and compile it to a dictionary. Since files can be shared, only one person needs to do it.

The original issue was to implement a function to uncapitalize acronyms.

I intend to work on #540 for the next release (though currently there is so much activity in issues alone that I can spend my entire available time without even fully catching up)

ghost commented 3 months ago

@Helium314

What really bothers me about this dictionary discussion is that it's quite simple to create and import your own dictionary (which is likely better than the ~10 year old default AOSP dictionary anyway).

This isn't really correct. Yes, you can import words into the personal dictionary using User Dictionary Manager (UDM) or Multiling O Keyboard, but there are issues with both methods — you chose to make the user dictionary word "weight" 250 for every word! You chose not to have a way to import/export such a dictionary with the word weight and shortcut... This isn't good for building your own dictionary, and using Multiling O Keyboard to import with a better word weight is highly problematic.

And expecting users to compile a binary? Really? Some of us don't even have a PC or easy access to one that we could do such.

I've done all this stuff to build a personal dictionary in HeliBoard, and I've given up and returned to Gboard — it's a hassle, and I tried to help with the dictionaries and you blew me off

You & other devs of this project keep going down paths of "making things more complicated" or flat out breaking things that don't need to be touched — not spell checking numbers & other characters thus breaking gesture typing —, instead of finding simple long term solutions that take the work off the developers — like your current implemention of the layouts, great on the "simple" part then just dropped back into complication for functional keys 🤷🏼‍♂️ — like Multiling O Keyboard hasn't been giving everyone a good example of doing this for over ten years 🤦🏻‍♂️ — now you've created something that just needs developers to do the work instead of stripping back things so users could contribute.

I'm not saying any of this to bag on you, or anyone. It's your project, do what you like, just realize your choices play a part in why it's going the way it is thus, "it's taking up too much time" is because you are choosing that path.

Helium314 commented 3 months ago

Yes, you can import words into the personal dictionary using User Dictionary Manager (UDM)

This is not about the personal dictionary.

And expecting users to compile a binary? Really?

Yes. You would need one user to compile and upload it. And it certainly doesn't have to be me who compiles it. Judging by PRs in OpenBoard and the dictionary repo I'm not the only one who can use the dicttool. Demanding specifically that I compile your dicitonaries is not something I'm happy with. With all the activity in this repository, I don't even have time to read everything when I also want to get things done. So I really want to focus on stuff where I can't be replaced by someone else running java -jar dicttool_aosp.jar makedict....

then just dropped back into complication for functional keys

No one forces you to adjust the layout, and if you don't then it's as simple as before. There is no developer needed to do the work, it's just a json file.

"it's taking up too much time" is because you are choosing that path.

Sure, only allowing simple custom layouts, and refusing to add other features over OpenBoard, would result in only bugfixing to be necessary. But I prefer giving more possibilities to the users instead, and I think this is a big reason for the amount of discussions and PRs.

ghost commented 3 months ago

Totally missing the point! Multiling O Keyboard allows you to adjust EVERYTHING of every layout, easily, simply — you're not in the same ballpark with your implemention though you could have one uped it, close but no cigar.

No one was expecting you to compile their dictionaries. I was doing research to help YOUR project with the dictionaries so a little helping me would have helped you so dictionary maintenance would be easier, typing would be improved, etc. But you don't want help! Hey, that's cool! It's your show, do what you like with it just be honest with yourself and clear about what others are saying or doing — you're taking somewhat extreme position and not really understanding what someone is saying.

No big! It's just a keyboard.

netizeni commented 3 months ago

As this thread is active, I will ask here rather than opening a new issue. Is there any way to tell heliboard not to type an uppercase word in the middle of a sentence while gesture typing? An example for the sake of simplicity, in my language: "This is How it's done". Obviously, I don't want to delete "How" word as there are cases when a sentence starts with it, but I don't want to get it in the middle of a sentence as well.

There are multiple words like this.

Also, is it somehow possible to change the word weight? For some reason, while gesture typing, it often suggests a word from personal dictionary which was typed 4-5 times like a year ago and now the keyboard type it even that I correct it numerous times for the word from a standard dictionary. I don't want to delete the learned word, just want it to be suggested less often in favour of other similar words. Here I'm talking about word's weight from non-personal dictionary.