Helium314 / HeliBoard

Customizable and privacy-conscious open-source keyboard
Apache License 2.0
2.22k stars 84 forks source link

Adding Glide Typing To Heliboard Without libjni_latinimegoogle.so #668

Open the-eclectic-dyslexic opened 5 months ago

the-eclectic-dyslexic commented 5 months ago

I finally have a little bit of time to write down what I have been doing on this front, so I wanted to open an issue here to keep track of the progress I have made, just in case anyone wants to give me a hand, as my time is super limited at the moment and I blow chunks at juggling tasks. Also, it should mean my work here doesn't go to waste if I fall off the edge of the world or something, haha!

So, first off, I will put a list of things I have yet to figure out, so I can get moving on this. Unfortunately, most of these have nothing to do with the algorithm that needs to be implemented, but rather my unfamiliarity with the Heliboard code base and the JNI overall. If I already knew about these things, I might have already made a working implementation to build on.

Stuff I know I still need to do before I could bring gesture typing to Heliboard without the google library as a hack:

static const JNINativeMethod sMethods[] = { { const_cast<char >("setDicTraverseSessionNative"), const_cast<char >("(Ljava/lang/String;J)J"), reinterpret_cast<void >(latinime_setDicTraverseSession) }, { const_cast<char >("initDicTraverseSessionNative"), const_cast<char >("(JJ[II)V"), reinterpret_cast<void >(latinime_initDicTraverseSession) }, { const_cast<char >("releaseDicTraverseSessionNative"), const_cast<char >("(J)V"), reinterpret_cast<void *>(latinime_releaseDicTraverseSession) } };



That last point is more important than it may appear at first, because doing this correctly is necessary for not breaking the compatibility with the google library's interface. Maintaining that compatibility is important, at least until the open implementation is as good or better than the google library one. I will touch on this more in a bit.

Now, onto the stuff I do know something about...
I am sure some people reading this would already know some of the following, but I didn't know any of it... so it would be nice if it was documented here! I will also write proper docs for this stuff and do a PR at some point, unless someone beats me to it. It will likely take me a while.

I would put money on what is inside that google library. It is not just a gesture typing library, as anyone who understands the dynamic loading of the google library already knows. I am near completely certain it is actually a closed source fork of the AOSP keyboard JNI code, similar to how chrome is a closed source fork of chromium. I believe it was abandoned when google keyboard was superceded by gboard, and hasn't actually changed in any appreciable way since. I looked inside gboard's apk. There was no googlelatinime library in there from what I can remember. It is possible that gboard just relies on this library being in the system though. I don't know. Regardless, most of the code in the google library is likely exactly what is inside Heliboard. There are some other changes that aren't just the glide typing implementation, which someone else can probably speak to, but those aren't super important yet and I don't know what they are in practice. The main thing this means is, any change to the JNI code is totally useless, or worse, if we want to allow users to load the google latinime and assume users will do that until our gesture typing is better. Anything we do to change this interface is going to mean extra work to handle exceptions arising from differences in the two libraries, unless it is done with strict adherence to the interface offered by the google library. In my ideal world, I would write some unit tests that feed both libraries the exact same function calls, and compare the outputs to know exactly what is different. That is a pretty big ordeal though, as I would need a bunch of simulated user data, and I would need to build mock environments around those tests; I don't think it is absolutely necessary, but it would be the "right" way to go about this.

In lieu of doing it the "right" way, we should at least know how the interfaces are different. I have run a comparison of the two libraries, and output the results into compare.csv. The point of this is it will allow me to cross reference all the functions offered inside the google library with the Java/Kotlin code, and ensure that heliboard provides proper implementations of all of those JNI calls. I have not done this yet, I don't know when I will be able to get to it, but I do plan to do this leg work. I suspect all we need to do is implement a single additional function that is used to request suggestions based on an input gesture. I don't know this for sure though. Thus, why I plan to check, and report back to this thread.

[compare.csv](https://github.com/Helium314/HeliBoard/files/14928929/compare.csv)

The short version of what that csv says is, the google JNI library's interface is almost a strict superset of Heliboards JNI library interface. This lines up with what I claimed about the google latinime being a fork of the AOSP code. This means, if we don't stray from that common interface, where it matters, people outside of this project could extract the shared object file out of heliboard and use it in the place of the google library for the various community maintained packagings of GApps, such as MindTheGApps. This would give lineageOS and other android forks, that use the AOSP keyboard, a fully open keyboard by default, if they packaged the heliboard JNI instead of the google one.

The long version is, there are only 7 functions found inside the Heliboard latinime library that are not in that google latinime library. I don't think any of them are directly referenced in any Java/Kotlin code, but I haven't searched yet. They probably don't matter for the interface definition, as they are likely helper functions that have been added by contributors over the years since the code was inherited from the AOSP. Of the remaining 814 functions in the comparison, 111 are unique to the google library. Most have names that suggest they are unique to gesture typing, and give a tiny window into how they implemented gesture typing. Some of the functions that exist in only one library or the other are also likely generated from internal use of stdlib templates. Unfortunately, my analysis of the shared objects ends there. It should be possible to see what functions call what other functions by further static analysis, but I don't think that is necessary. It is also not something I have done before, and don't know much about tools that can assist in it.

For anyone that cares how I know what functions are in the google latinime, because I had no idea how to do this before, I included some scripts in jni_compare.zip. Simply cd into the extracted directory, run jni_lib_extractor.sh, pick the architecture for it to download the appropriate shared object files. Then run compare.sh and you should be able to replicate my results, assuming you have all the dependencies (python, awk, c++filt, readelf, and grep). You can also use the jni_lib_extractor.sh if you are looking to extract the google library for gesture typing in heliboard, but are a bit confused by the process of acquiring it... and don't trust someone sending you a random shared object file, but you do trust the maintainers of MindTheGApps.

[jni_compare.zip](https://github.com/Helium314/HeliBoard/files/14928936/jni_compare.zip)

Now, comes the stuff I am most familiar with. I haven't implemented glide typing before, but I have worked on something similar. I worked on optimizing keyboard layouts for glide typing, to reduce the number of inherent errors in glide typing, based on user vocabulary. This basically involved implementing half-baked glide typing, to compare the gestures of words against one another.

This is all to say, I am somewhat familiar with literature around gesture typing, at least as far as it pertained to the work I was doing in the past on this. Odds are pretty decent that the method the google library uses for gesture recognition, as part of glide typing, is a modified version of SHARK2. One of the inventors of it, Shumin Zhai, has been associated with Google for a long time. It is likely, based on the timing and the naming of functions in the google library, that this paper is all we need to get very similar results. There is probably more recent literature in this domain, but I don't know any of it. Sorry! The bright side is though, once we have SHARK2 implemented, we can swap in other gesture recognition implementations as long as they can fit inside the same interface... or we can just iterate on what we have until it feels right! The thing about it is, while the algorithm is 20 years old, it does work pretty well, so I am not all that concerned. The more important part is how you score the results the gesture recognition returns to you, based on context. The original paper, and it appears the google library, simply use bigrams for context, but I think it is perfectly possible to do better than that once we have something to work with. I would put some money down, betting that whoever wrote the stuff in the google library at the very least talked to Shumin Zhai about how to do it. :-)

https://www.researchgate.net/publication/228875756_SHARK2A_large_Vocabulary_shorthand_writing_system_for_pen-based_computers

So, it has been a little while since I have had to implement anything related to SHARK2, but what I can tell you is there are a few important points.

All characters on the board can and should be represented as points, not as collision boxes, like they might often be in other forms of touch screen keyboard input. This allows us to create ideal versions of gestures to compare against, by simply playing connect the dots. These gestures to compare against are referred to as templates. How we actually compare the input gesture against the templates will be explained in a little bit.

Some words will have the same gesture template no matter what you do, and there isn't really a clear way of disambiguating them, aside from letting the user draw exaggerated circular paths at the double letters. A lot of users won't want to do this, because the whole point is glide typing is supposed to be faster than the alternative. Instead, we should infer based on context, by leveraging other suggestion methods. The idea is you take the suggestions given based on the gesture, and compare their likelihood of occuring based on the context of the previous few words. It is not too dissimilar to the way Heliboard already suggests the next word to you outside of gesture typing. For example of what I am talking about "feel", "fell", and "fel", are all linked to the same gesture, the one described by creating a path through the keys "fel" in order. The only way to disambiguate these is with context. Additionally, characters that aren't directly represented on the base layer of the keyboard need to be simplified, which will create yet more ambiguous templates. For instance, "it's" and "its" have the same template. So do "résumé" and "resume".

Reducing the search space is a must. You don't want to compare your input gesture against every single word in the dictionary. Ideally, all the words in the dictionary would be organized into buckets labelled by the shared first and last character of the words in that bucket. Then we would perform collision checks between the points that represent the keys and some shapes, likely circles, positioned at the first and last points in the input gesture. This allows us to narrow our search space down to a much more manageable list. It is also possible to simply filter the dictionary for words that match the pattern we are looking for, but it is slower than having them pre-sorted into buckets. An example of how this would work is illustrated below. 

![gesture_reduce](https://github.com/Helium314/HeliBoard/assets/112126910/7d5f9ee9-31bd-410e-92d6-90f1cbb37b31)

Here we are looking at the gesture for the word "watch" in QWERTY. We have performed a collision check at the beginning and the end. The result is we know the word we are looking for should start with one of (q,w,e,a,s). We also know it should end with one of (y,u,g,h,j,v,b,n). Then all we would have to do is search through the buckets labeled accordingly, reducing the search space down to 5x8=40 different buckets. This is opposed to looking in all 26x26=676 buckets. This is not going to be a 94% reduction in search space, as it may suggest by looking at the number of buckets, because words are not evenly distributed among the buckets. It does help a huge amount though! Sometimes you will get an outlandish reduction in the search space this way, and sometimes you only get a big reduction. The keyboard layout actually plays a role in how much this will speed up the evaluation of common gestures. The size of these collision shapes is variable. There is probably a suggestion in the paper, but nothing stops us from tweaking these. It is going to be a matter of balancing speed against the danger of pruning away the correct answer. It is very possible to make the size of these collision based on user settings... assuming the interface allows for this. It is also possible to experiment with different shapes, based on assumptions about how users usually make motor errors while glide typing. That probably isn't necessary, but it is worth thinking about. Every time you reduce the search space, without removing the correct gesture from the search space, you have a chance to improve the suggestions returned... and it goes faster!

We might not have much choice between buckets and filtering the dictionary, depending on what we have to work with for the interface. I would, however, assume there is a way to filter the dictionary pretty quickly, as the original paper did not use the bucket method, and simply filtered the dictionary, which was stored in a linked list.

In the absolute ideal scenario, if we did presort the word list into buckets, we actually would probably want to presort the simplified words, not the raw words. We would also want a map for reversing the process on all the suggestions at the end of the gesture comparisons. We don't want to return the simplified words, we want to return the actual words they correspond to. This also may result in returning more suggestions than expected by the function caller, but never less than they expected. The only way to end up with less than the caller expects is if we narrow our search space down to a word list smaller than the number of suggestions expected. With regards to my ideal scenario, I highly doubt iterating over the simplified words and then decoding after comparisons is done is the way it is done in the google library; it might be a good stretch goal for heliboard, for an optimized way of generating glide typing suggestions. The sizable downside is it would likely require breaking compatibility with the google library. I haven't seen what the real world performance difference is in practice, but not having to iterate over the entire dictionary is always good. I know doing this made my code for optimizing keyboard layouts go way way faster, but that code didn't need to run in real time, so performance measurements are a bit different. It was the sort of thing you set to run on your plugged in laptop and come back the next day. It shaved a couple hours off the run time to not iterate over the dictionary, but that may not be noticable from the perspective of response time provided to the user.

Now that we have pruned our search space, we need to compare the user input against the templates for our possible suggestions. There are two scores we need to care about. There are shape scores and location scores. The **smaller** these scores are, the better our match is considered to be. When looking at each potential suggestion, we score them, and keep them in a priority queue, with a max capacity of the number of suggestions we want. It is then possible to return this queue as our set of ordered suggestions. Interface permitting, we can also return the shape and location scores (or a single combined score) along with the results to the caller. This allows them to use the scores as part of how they weigh the suggestions.

Shape scores are really straight forward. The first step is we sample the user input uniformly along the path it describes. Then we take the same number of samples, uniformly, of the gesture template for the word we want to compare against. (Remember the gesture template is where we play connect the dots with the keys that make up the word we care about.) Next we want to normalize the the gesture and the template we are comparing against. We do this by scaling the template so that its longest dimension, for its bounding box, is the same as the longest dimension of the user input's bounding box. Then we translate the template by some delta so that the centre of it's bounding box, the centroid, is in the same location as the centroid of the user input. (In theory, we actually should scale both the template and the user input to some predefined size, but it is also possible to just scale the final score appropriately so that it is properly normalized instead of moving everything everywhere.) Then we use the pythagorean theorem to get the euclidean distance between each pair of comparable points. Once we have all those distances, we take an average of the distances measured, and tada you have a shape score! An example of comparing some fake user input against the gesture for the word "watch" is below.

![gesture_compare](https://github.com/Helium314/HeliBoard/assets/112126910/81905fec-4557-487a-a950-4a51ec37724d)

The red path is the template, and the blue path would be the user input. The black line segments are representations of the euclidean distance measurements between each pair of sampled points. I have ignored the normalization step in the diagram, because I am not sure how to draw it. If this becomes something people are confused by, I am happy to give a shot at drawing it though. All you really need to know about normalizing is, we need the gestures in the same place when comparing them, and they should be the same size. Our previous pruning step actually comes in to save us here, because if we weren't pruning the search space, it is possible we could have an identical template somewhere else completely on the keyboard, and once we normalized to our template, they would look the same even though we didn't gesture anywhere near that location!

The number of samples we take is determined by how fast versus how accurate we want the gesture comparisons. 100 samples is a good place to start, but this too can be a user setting, if the interface allows. On some proprietary keyboards this was/is done with a slider with one end being "accurate" and the other being "fast". I remember someone suggested to me there is also the possibility of using some fancy continuous math to avoid sampling the path and get a more accurate measure. However, doing the pythagorean theorem 100 times is way more convenient, and I don't remember the specifics of the continuous approach, only the basic idea. That method might also be prone to the same problems the authors of the paper found with elastic matching. It is better to just do the dumber thing for now; it works.

With location scores, we do something a little different. The idea here is, if the user does an excellent job of giving precise input, we should boost the closest match. I never needed to use these in my past work, because if I remember correctly, they were too expensive to consider when trying to score a keyboard layout. So... I am a bit hazy on these. I also suck at reading mathematical notation, so I am winging it a bit here as I read the paper while writing this.

It looks like we sample the user input uniformly again, but this time we are trying to ensure that no points of the user input fall outside of an area defined by some distance to the template path. In the paper they used one key width as this distance cut off. We don't actually want to normalize the input to compare favourably to the template this time, because the whole point is this is to reward precise inputs. The analogy here is sort of like if you were driving a car for a test, and you stayed on the road the whole time, you don't receive any penalty to your test score. However, if you ever go off the road, you get immediately penalized. I think the trick here is to take every pair of key centers that make up the line segments in the unsampled template, and then check if the user input samples fall inside the a certain distance of any of the line segments. If the current point doesn't fall inside the area we want, tally on the distance between that sample point and the corresponding point in the template samples. It looks like they do it a little differently, but I think my way would be faster? This looks like it would be O(N^2)... for every point in the sampled user input they are performing euclidean distance measurements against every sample point in the template. It is really expensive by comparison to shape score. We also need to keep in mind, we are comparing this input against multiple templates. I will probably have to think about this some more to make sure I understand well enough to reduce the complexity the way I am thinking. It should be possible to perform no more than 3-9 measurements per user input sample, instead of 100. That is if we even need location score; it may not be necessary to have location scores at all to start, based on what I am reading in the following sections.  I don't know how to draw this yet, and I am tired, so I won't for now. Request if you want this and I will draw an example.

It looks like we have two presented options here. We can either try to force the two different scores into a single final score by adjusting the location score, such that it is more directly comparable to the shape score, or we can weight dynamically between them. The dynamic weighting looks at whether the gesture is being performed above or below some threshold speed. Now, given that I didn't use location scores I didn't look at dynamically weighting methods at all. The math involved in this is the stuff I find the most difficult to understand; I may get this entirely wrong. It looks like they have a special function, based on a long history of lab data from previous studies, that assumes how long it should take for the average user to be able to glide type a specific word's gesture. How fast the gesture is drawn determines how much to weight each input model. Forcing the two scores into a single score statically is possible, but might not give great results without a lot of tweaking, and may never give as good results as dynamic weighting. It looks like they actually trained some baysian model to find good weights for their location score weights... for every word, per keyboard layout. That is certainly a way to do it, but I don't think we will be doing that. They had the benefit of working with one keyboard layout and only having to support English.

So, with all of that in mind. Let's get into related reading!

There are some github projects that implement gesture typing. A few of them are explicitly billed as being through SHARK2. They look like they might be university assignments? It spins up a web server, and allows you to glide type in a browser using a mouse. Unfortunately the webpage it provides isn't very mobile friendly, so you can't really test if it works well with thumb input. Some of them are god awful, but I did find one that was good enough that I was sold, and probably use it to figure out how to weight the shape vs location scores, when we get to that point.

https://github.com/varun93/shark2 
https://github.com/aferruzza/SHARK2-Gesture-Typing <- the good one, if I remember correctly
https://github.com/Trisha11r/Decode_Gesture_SHARK2_algo

None of these have licenses attached to them, though, so copying directly is a no-no. 

Helium also directed me at an implementation of gesture typing inside an OpenBoard fork, by a user named wordmage. That is also not licensed, because it is based on some 7 year old code by a user named shijieKika which is of course... not licensed! I have suspicions about what the older repo might be, but I won't go on about them because I am not certain my thoughts are founded one way or the other. In a total shot in the dark, I opened an issue asking the author to to add a license. I expect nothing to come of that, so don't count on being able to copy any code from that repo. It is probably okay to get ideas about how to use the data structures in Heliboard, while doing gesture matching, from these two projects though. We can also reference at least one of the python projects to get a different perspective on the same thing that is going on inside the paper. 

https://github.com/wordmage/openboard
https://github.com/shijieKika/GoogleKeyboardV7

I will state, I think even if we did directly copy the code from the wordmage repo or the shijieKika repo, which we definitely should not unless it magically gets licensed, it may not even do what we want. The wordmage repo has worse gesture typing than the google library does, and it doesn't seem to follow the correct interface based on a silly test I tried where I pulled out its internal library and tried to import it into heliboard in place of the google library. (edit: Helium confirmed it does follow the same interface, as long as a commit that messed up some of the names is reverted https://github.com/wordmage/openboard/commit/70e59e88ec3d4ac5a50190ceb199765a5ad0078c). I also haven't been able to get the shijieKika repo to compile at all yet. However, to be fair, I haven't tried that hard.

The only option I know of that contains gesture typing that is licensed in a way we can use directly is florisboard. It was added to an older version with commit https://github.com/florisboard/florisboard/commit/c0f90a7ea4cc59f671e174bd08226e3c7872d156, I think? I haven't looked it over yet, add it to the pile of todos.

Okay, that is all I have for now. Info dump done. I will check back when I have time to continue my quest. Let me know if anyone wants clarification on anything I wrote, or wants to help me on my fool's errand. :-)
the-eclectic-dyslexic commented 5 months ago

I read all of what I wrote again. There were some mistakes. I'll probably continue to update the original comment any time it becomes clear to me that I made a mistake, or something was unclear.

Helium314 commented 5 months ago

Wow, thanks for this huge amount of information :heart: Definitely take your time for this... I also realized that currently I need to work more slowly on HeliBoard in favor of some other stuff.

no documentation in the code

Even in the java part this is happening too often, and the sometimes very convoluted style also doesn't help figuring out what is going on...

Maintaining that compatibility is important, at least until the open implementation is as good or better than the google library one

Would it be possible to have a separate interface only for your approach? Making sure it's not called (from java side) when the Google library isn't loaded should be simple.

That is a pretty big ordeal though, as I would need a bunch of simulated user data, and I would need to build mock environments around those tests

Maybe we coud just record user data using a special build? Only drawback I see here is that for longer words I may change my swiping a bit when I see the preview words... which would likely differ between libraries.

buckets

In the CSV I see a lot of (Patricia)Trie functions, that's maybe sort of related to dictionary structure, and maybe also with the buckets you're speaking of?

using that other library from wordmage's fork

The issue is some different naming. I remember quite a while ago I compiled wordmage's version with some changes, so I could load the resulting libary like it's done now with the google one. Will dig around in my backups...

Helium314 commented 5 months ago

The issue is some different naming. I remember quite a while ago I compiled wordmage's version with some changes, so I could load the resulting libary like it's done now with the google one. Will dig around in my backups...

I think when building wordmage's version you need to revert https://github.com/wordmage/openboard/commit/70e59e88ec3d4ac5a50190ceb199765a5ad0078c so the native library refers to com_android_inputmethod instead of org_dslul_openboard_inputmethod

the-eclectic-dyslexic commented 5 months ago

Wow, thanks for this huge amount of information :heart:

Hopefully some of if it's useful information!

Definitely take your time for this... I also realized that currently I need to work more slowly on HeliBoard in favor of some other stuff.

Definitely do that. Heliboard, at least for me, is in a good place for my daily use. Let it ride for a bit is what I think. You have to have a life outside of open source work, especially if you aren't paid a livable wage for it.

Even in the java part this is happening too often, and the sometimes very convoluted style also doesn't help figuring out what is going on...

Ya, I think this is likely Heliboard biggest hurdle going into the long term. It's hard to contribute to something so opaque. I honestly think the best way for people to contribute right now is to read, document, and in sometimes refactor poorly understood code. That's just my opinion though, maybe others feel differently. I'm hoping to do this for the data structures I need to interact with...

Would it be possible to have a separate interface only for your approach?

Absolutely, the biggest downside is breaking the interface makes it harder for other AOSP forks to share their changes with Heliboard, but it can be done. On the other hand, breaking the interface is likely to be necessary at some point, both for maintainability when removing unused undocumented code, if and when it gets written around or replaced.

Maybe we coud just record user data using a special build? Only drawback I see here is that for longer words I may change my swiping a bit when I see the preview words... which would likely differ between libraries.

We could! I think I still would need to build the mock environments for the tests though, and that seems very daunting to me right now with how little I understand the code base. It would definitely be extremely valuable, but I'm honestly not sure where to start at the moment. I'll ponder this as I read code and try to think of the lowest barrier to entry for this. Building tests, even if only for the code I write, would definitely improve the speed of iteration. It would be so stress free to just have a couple hundred gestures with expected suggestions, and an expected top suggestion, and be able to check how well we are doing without even having to run heliboard. It'll likely involve feeding each test a dictionary, a layout, and a gesture. A single test structure could then be repeated over a massive set of input combinations.

This wouldn't yet cover how to test the natural language model ranks the words the gesture recognition returns though. I'll have to think about how to test that without obfuscating the results the gesture recognition returns. So far the language model has been at the bottom of my list, as I just have assumed we could jerry rig something up to the other suggestion methods in heliboard. I'm thinking there has to be a way to hook up to the next word prediction and feed it a list of possible next words to rank, then create a combined score from the shape & location scores and the next word scores. This would likely be similar to the bigram model used in the paper, but I'm willing to bet the next word prediction in Heliboard goes more than two words deep. I've had it replicate entire second halves of sentences I've said before, just by tapping the middle suggestion.

In the CSV I see a lot of (Patricia)Trie functions, that's maybe sort of related to dictionary structure, and maybe also with the buckets you're speaking of?

They could be, I'll see what I dig up. They might also be where the n-grams are stored for next word predictions. They could also be related to partially typed word completion suggestions.

The issue is some different naming. I remember quite a while ago I compiled wordmage's version with some changes, so I could load the resulting libary like it's done now with the google one. Will dig around in my backups...

Oh! I vaguely remember reading some commits where the JNI code kept having namespaces changed, both in the wordmage fork and in other forks. That's helpful thanks! I forgot about that completely when writing. I'll endeavour to update that later.

Charles7z commented 4 months ago

Just as an FYI though you all probably know this, FlorisBoard had a very functional gesture typing but took it out to redo with other nlp stuff.

Maybe it would be of help to look at 🤷‍♂️

the-eclectic-dyslexic commented 4 months ago

Just as an FYI though you all probably know this, FlorisBoard had a very functional gesture typing but took it out to redo with other nlp stuff.

Maybe it would be of help to look at 🤷‍♂️

Thanks @Charles7z, I'm already in contact with them! Their gesture typing code is also an undocumented mystery, and I'm looking to help them reimplement it to work with their NLP. (Ya, I'm a silly person. I actually approached them before I even discovered heliboard...)

Which reminds me @Helium314, that's one of the things that is simultaneously slowing down my work here, but in the long run should be a good thing. I'm going to be writing the same algorithm twice, and getting the benefit of testing there before I even really start here. I got to dictate the interface there exactly, so it shouldn't be as big an ordeal as this. This means, if we don't find a good way to test here, then at least if I use the same weights we should get the benefits of the testing I'll do in florisboard, if I implement it the same way. We have a solid plan over there, so don't worry about collecting gesture data yet. The maintainer of florisboard has a plan for how I can generate some, at least with a mouse or tablet on desktop.

Charles7z commented 4 months ago

I'm looking to help them reimplement it to work with their NLP.

@the-eclectic-dyslexic

That's great to hear the code will be used by more than one project.

Charles7z commented 4 months ago

@the-eclectic-dyslexic

I forgot to mention, though again it's probably something you've already read, a study done to combine T9 prediction or Suretype(T9+) keyboard with gesture typing

https://www.researchgate.net/publication/369063681_Enhancing_Older_Adults'_Gesture_Typing_Experience_Using_the_T9_Keyboard_on_Small_Touchscreen_Devices

Just food for thought, not that you don't have enough, but always good to have ideas beforehand than after.

Swype was doing this as well but apparently shelved the whole keyboard thing.

the-eclectic-dyslexic commented 4 months ago

That's great to hear the code will be used by more than one project.

It won't be the exact same code likely. The data structures are different, and florisboard wants to go rusty. It'll be the same algorithm as far as I can make it though... if we want to make things interesting we could try and add the crate I'll be making for florisboard, as what heliboard's jni code delegates to. That would complicate building heliboard, but it would mean every time one project improves gesture typing both projects get to use it. That would be very cool, but I know nothing about the build headaches that might be involved. It also locks in breaking with the google library. But that might be okay! We can cross these bridges when we get to them. First order of business is for me to actually implement this for florisboard, then we can decide what to do.

Charles7z commented 4 months ago

but it would mean every time one project improves gesture typing both projects get to use it.

That's certainly a big issue right now with open source keyboards. There's a lot of them but nothing really being pooled together to help the foundation of autocorrect, next word suggestion, prediction, and gesture typing, and therefore the big tech companies keep users tied to their keyboards cause they're the only ones able to put in the man power and pull in the data. So, IMHO, anything that helps at the base level for the whole ecosystem is a big PLUS! Because we all know how open source goes — many forks & many projects.

Helium314 commented 4 months ago

That would complicate building heliboard

I think it would still be worth trying. But the biggest potential issue I see is that a different dictionary format might be necessary for that glide typing library.

That's certainly a big issue right now with open source keyboards

TBH I wouldn't be surprised if it was an issue with closed source keyboards... As far as I know some code has found the way into multiple AOSP keyboard forks (LineageOS, OpenBoard, Indic Keyboard, maybe more) in the past.

Charles7z commented 2 months ago

@the-eclectic-dyslexic

just thought i would mention, just in case you didn't notice, that Futo keyboard has glide typing working.

the-eclectic-dyslexic commented 2 months ago

@Charles7z thanks I'll check it out! I haven't had much time to work on this, between life getting hectic and trying to manage a flair up in my chronic pain. I'm happy to have a lead for when I tackle this though!

edit: on a quick glance, it looks like they implemented the holes in the JNI library. That might be pretty easy to port into heliboard if it can be determined that the code actually originated in futo instead of one of the places aforementioned where it wasn't licenced. I will take a look at the code, and compare the interfaces to check. At the absolute minimum, if the futo implementation checks out, it'll create the means to close this issue (though maybe not achieve exactly what I was setting out to do, but that's okay!)

Helium314 commented 2 months ago

That might be pretty easy to port into heliboard if it can be determined that the code actually originated in futo instead of one of the places aforementioned where it wasn't licenced

I don't think this is correct: the licence basically says you're allowed to look and compile, but I don't think you're allowed to modify it. Probably it's not compatible with GPL 3.0 of this project.

At the absolute minimum, if the futo implementation checks out, it'll create the means to close this issue

It would certainly allow for more flexibility when there is no need to stay compatible to the google library.

the-eclectic-dyslexic commented 2 months ago

licence

Ah yep, I went and assumed before I checked the licence. Thanks!

devycarol commented 2 months ago

I haven't read this thread in full, but a fun simple idea I have that could be done in the meantime is "spider typing." Basically, every character key would work as sliding input. Dragging from one to another would type those two characters. You could chain more together by hovering keys for a configurable duration—either tied to long-press timeout or its own config.

This would be nice to have alongside glide typing even, as you'd be able to gesture out things like numbers, spaces, and punctuations with perfect precision. Maybe you could have a functionality to temporarily fall back to spider-typing if you wanted to gesture non-dictionary gibberish or something. Something like a toolbar key.

vgambier commented 1 month ago

I'll add that another FOSS keyboard with glide typing (and not just English) is AnySoftKeyboard, which uses the Apache license.