Lundez / JavaSymSpell

SymSpell v6.4ish ported to Java 8. Will be a module in my Master Thesis.
MIT License
22 stars 13 forks source link

Good for Android App? #4

Closed trans closed 6 years ago

trans commented 6 years ago

Would this library be a good choice for use in an Android App? Or is it too memory intensive?

And if yes, any guidance? One problem that occurs to me is that it has to be possible to load the dictionary from an android asset file. So a function that takes an InputStream instead of a file name might be necessary.

Lundez commented 6 years ago

Hello!

I'd say that it definitely is a possibility to use within an Android App. A text-file is not that heavy and the code-base is pretty small and as WolfGarbe argues it is incredibly fast! Do not that you should only create and load the dictionary once per onCreate of the activity. Dagger would be helpful using the Singleton pattern.

As I've not created a way to import this via Gradle I'd say it's really easy to implement the InputStream-function by yourself, maybe you could create a pull request even. :)

I don't know how you're thinking about implementing this but Android do provide a Spell Checker by default. Using an Edit Text you've got it for free, Edit Text even provides autocomplete using: android:inputType=TYPE_TEXT_FLAG_AUTO_COMPLETE.

I hope this helps you somewhat?

trans commented 6 years ago

Great! I will give it a try. My use case is for a keyboard app (input method). And I'd be happy to submit a pull request for anything I find necessary to get it to work with Android. Thanks.

trans commented 6 years ago

Ran into an issue. I made a jar and added it to my app. Compilation spat out an error:

Invoke-customs are only supported starting with Android O (--min-api 26)
Message{kind=ERROR, text=Invoke-customs are only supported starting with Android O (--min-api 26), sources=[Unknown source file], tool name=Optional.of(D8)}

What are "customs"?

trans commented 6 years ago

Well I think I figured the above by adding to build.gradle:

android {
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }

Fixed the first problem, but then I got another error:

E/AndroidRuntime: FATAL EXCEPTION: main
      Process: tabcomputing.shelkey, PID: 22435
      java.lang.NoSuchMethodError: No static method parseUnsignedLong(Ljava/lang/String;)J in class Ljava/lang/Long; or its super classes (declaration of 'java.lang.Long' appears in /system/framework/core-oj.jar)
          at SymSpell.SymSpell.loadDictionary(SymSpell.java:217)
          at SymSpell.SymSpell.loadDictionary(SymSpell.java:197)
          at tabcomputing.shelkey.InputService.loadDictionary(InputService.java:236)
          at tabcomputing.shelkey.InputService.onCreate(InputService.java:79)
          at android.app.ActivityThread.handleCreateService(ActivityThread.java:3242)
          at android.app.ActivityThread.-wrap5(ActivityThread.java)
          at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1594)
          at android.os.Handler.dispatchMessage(Handler.java:102)
          at android.os.Looper.loop(Looper.java:154)
          at android.app.ActivityThread.main(ActivityThread.java:6247)
          at java.lang.reflect.Method.invoke(Native Method)
          at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:872)
          at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:762)

Note the line numbers are going to be off from master because I am using my modified branch (https://github.com/trans/JavaSymSpell/tree/android).

trans commented 6 years ago

Got through that issue by changing the line in question to:

count = Long.parseLong(lineParts[countIndex]); 

But then hit the doozy:

E/AndroidRuntime: FATAL EXCEPTION: main
      Process: tabcomputing.shelkey, PID: 25673
      java.lang.OutOfMemoryError: Failed to allocate a 16396 byte allocation with 20349344 free bytes and 19MB until OOM; failed due to fragmentation (required continguous free 20480 bytes where largest contiguous free 16384 bytes)
          at SymSpell.ChunkArray.add(ChunkArray.java:46)
          at SymSpell.SuggestionStage.add(SuggestionStage.java:78)
          at SymSpell.SymSpell.lambda$createDictionaryEntry$0$SymSpell(SymSpell.java:145)
          at SymSpell.SymSpell$$Lambda$0.accept(Unknown Source)
          at java.lang.Iterable.forEach(Iterable.java:75)
          at SymSpell.SymSpell.createDictionaryEntry(SymSpell.java:145)
          at SymSpell.SymSpell.loadDictionary(SymSpell.java:219)
          at SymSpell.SymSpell.loadDictionary(SymSpell.java:197)
          at tabcomputing.shelkey.InputService.loadDictionary(InputService.java:236)
          at tabcomputing.shelkey.InputService.onCreate(InputService.java:79)
          at android.app.ActivityThread.handleCreateService(ActivityThread.java:3242)
          at android.app.ActivityThread.-wrap5(ActivityThread.java)
          at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1594)
          at android.os.Handler.dispatchMessage(Handler.java:102)
          at android.os.Looper.loop(Looper.java:154)
          at android.app.ActivityThread.main(ActivityThread.java:6247)
          at java.lang.reflect.Method.invoke(Native Method)
          at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:872)
          at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:762)

I was trying to load the "frequency_dictionary_en_82_765.txt" dictionary, and it took a long time and finally put out the above error.

trans commented 6 years ago

Got past the memory problem using Manifest entry

<application
   android:largeHeap="true">

So it's working. But it takes far too long to load the dictionary. Around 10 seconds. For a keyboard app I'm limited to about 2 seconds.

Lundez commented 6 years ago

Hi!

Good job solving everything that was thrown at you.

The bottleneck of this spell correction is that it takes time to create the dictionary while building it. A possibility would be to save the finished dictionary somehow (in Python one can Pickle things I know) and then load that. Another possibility is to use a server, although bad reception would be a huge problem.

I'm sorry if this library don't fit your needs but it's the drawback of using it (load-time). Read more about this in Wolf Garbes blog post.

trans commented 6 years ago

I looked at the code for places it could be sped up. There are a few. Not using String.split(regexp) and using indexof instead shaved off a second or two. Using int instead of long probably would help too. There are other places as well, but I think in the end I am going to use a trie solution.

Would you still like my InputStream pull request?

Lundez commented 6 years ago

You could create a request and I'll look into it.

Okey, trie solutions are powerful too. I guess it would be good with the prefix.

What are you gonna use the spell corrector for if I might ask?

trans commented 6 years ago

Submitted pull request.

I may have to roll my own. I already have one actually but it needs some serious refactoring.

What I might do though, and I will explore this first, is take JaveSymSpell, and see if I can get the load speed up and instead of storing the delete terms in a Hash, store them in a Trie or similar tree structure (maybe just a multi-dimension SparseArray). This will make lookup slower -- O(log n) instead of O(1) -- but it should solve the heavy memory problem. And actually with some intelligent lookup code the difference in lookup speed won't be even that much because 1) no hash to compute, and 2) don't re-traverse terms with the same prefix.

What do you think?

Lundez commented 6 years ago

Thanks, I'll check it out when I have got more time!

Hmm.. It's a sound theory. Try it out and create some kind of speed-test. But it feels like I only understand partly of what you really want to do and thereby it's harder to really grasp where most of the weight should be.

trans commented 6 years ago

Well, I am sad to report that this SymSpell design just isn't cut out for something that needs fast load times. (Amazing isn't it? That as fast as our computers are today it is still not enough). I whittled it all the way down to loading the dictionary and generated deletes into two arrays (one for each) -- just about as fast as it can get, and it was still far too slow (2x improvement through staging and ChunkArray still wouldn't be nearly enough.)

BTW, I have tried serialization libs before too, e.g. Kryo, and they aren't much better. In fact sometimes they are slower.

Bottom line... as cool as the idea of SymSpell is, it's just too much data. While I might try to use incremental loading (small dictionary first, followed by full dictionary in the background), I have had better success using traditional deletions/transpositions/substitutions/additions approach. The focus there is on speeding up lookup (loading is around 2 secs). Thankfully there are reasonable strategies for improving lookup speeds. So I am going to have to go back to that approach and just finish cleaning up my old library.

Thanks for your time. And maybe this information will be useful to others in the future. If anyone does figure out a super fast way to load the data I'd sure be interested.

Lundez commented 6 years ago

Okey! I'm still confused about what your app is trying to achieve and why you can't load the dictionary in the background as it starts and then use the already loaded library. It should be fast enough once the dictionary is loaded once.

Well, good luck with your application and however you solve the problem, I believe in you. :)