Helium314 / HeliBoard

Customizable and privacy-conscious open-source keyboard
Apache License 2.0
2.22k stars 84 forks source link

Detect CamelCase as separate words? #978

Open Helium314 opened 1 month ago

Helium314 commented 1 month ago

Currently HeliBoard detects a continuous sequence of letters and specific characters like ' as single word, with some special cases like urls or emails. In #960, the addition of CamelCase modes was proposed, and it's worth considering what should be seen as a word here. It might be useful to not see CamelCase as a single word (current behavior), but detect Camel and Case separately via lowecase/uppercase boundary.

I don't have a particular opinion on this, but would such behavior be wanted (and why / why not)?

(this is also related to the more general question of what should be a "word"; there are some issues where people expect numbers to count as part of a word)

devycarol commented 1 month ago

In the general case, I don't think so. If it has its own mode, then perhaps. The logic would be pretty tricky though.

I'm trying to allow non-composition characters to be word-selected for the delete sliders I'm working on, and even that's been pretty complex.

I see two cases shaping up: text editing and word composition. For something like numbers, you probably want to include numbers and such as a "selection unit" in the former and exclude it from the latter.

And non-Latin languages need to be kept in mind as well.

devycarol commented 1 month ago

(and that's not even touching synergy with selection by double-tapping text. is that governed by the IME or lower in the system?)

Helium314 commented 1 month ago

If it has its own mode

That might be a pretty useful thing (not just related to word-delete-swipe modes). So users could choose e.g. whether to treat numbers a part of a word. Though I'm not sure whether I want more complexity that will likely be wanted if we add this (e.g. treat numbers as part of word, but not when saving to typed history, or not when at start, or... endless possibilities).

is that governed by the IME or lower in the system

This is coming from the system.

devycarol commented 1 month ago

The way it's coming together on my end is that I'm not touching "get word range at cursor" at all and instead duplicating that logic into a separate interface that adds support for selecting non-word characters. It's in StringUtils.