SeventhM / hackerskeyboard

Hacker's Keyboard edit
https://play.google.com/store/apps/details?id=org.pocketworkstation.pckeyboard
Apache License 2.0
13 stars 1 forks source link

Project cannot read old dictionaries #4

Closed SeventhM closed 5 months ago

SeventhM commented 6 months ago

Currently I've been double checking everything to figure out how to get suggestions working on this fork. So far I've narrowed the issue down to one of a few possible problems

In later versions of Android, it seems google changed the expectations of when a keyboard should be expected request showing prediction candidates (see here. As noted by the linked issue and by https://github.com/klausw/hackerskeyboard/issues/901, this has a simple suggestion: call setCandidatesViewShown at the end of onCreateCandidatesView and onStartInput. However, hacker keyboard has some limitations towards this approach:

  1. As noted by the issue in google's issuetracker linked above, some devs notice that this causes problems with the back button in certain apps. These issues also appears in Hacker's Keyboard. It seems that in some apps (notably apps based on the Chromium project, including many browsers and even apps such as Discord), the keyboard is often called in places where no text is selected. This will hide the keyboard at the bottom of the screen, but not dismiss it, making it still active, Often there is no way to interact with the keyboard in this state (I've even seen the keyboard just crash in some cases). Some do recommend workarounds for such cases (such as getting rid of the keyboard ourselves with requestHideSelf(0)). However it would be nice to not need to do any of that
  2. The good news is Hacker's Keyboard already has a prebuilt way to avoid this problem. Our version of setCandidatesViewShown actually handles quite a bit more code than it appears to. Notably, it only shows candidates if we meet the following criteria

    • We request the candidates to be shown. Same as the initial function
    • We determine the keyboard should be shown here
    • Prediction is turned on
    • The inputView itself isn't null. This check seems to be a bug and seems like it should be part of the next check. As such, this fork actually will go ahead and remove this check
    • We actually care about whether the inputView is showing. By default, we do care, and so we check if it's showing. If we don't care, then we pretend this check succeeded

    If all of those checks are good, then we first do some setup of the candidates strip, then call super.setCandidatesViewShown. If these checks fail, then we actually go out of our way to delete the candidates strip before calling super. In theory, this seems to fix all of the issues as we should only have a candidates strip if the keyboard is showing or if we explicitly shouldn't care. In practice, it seemed to not work, leading to some trying to bypass this function by calling super directly. What's going on?

  3. I'll keep it short, as this does seem to me to be a bit of a red herring as I'll get to. But it seems the check that prediction is turned on in the first place. Even if all of the settings says predictions should be on, sometimes it just doesn't report that prediction is on. This may be because my fork is struggling to actually read dictionaries, however
  4. It turns out that my attempts to read the dictionary (whether it's on a device whether I needed to go digging around for a link to the dictionary or on a device that can confirm it still had a legitimate copy of the old dictionary) have turned into failures. While it can read the dictionary just fine, it reports the dictionary as only having a couple words maximum, which does not satisfy the check for a dictionary (which does have an arbitrary word minimum to not catch test dictionaries or such). I haven't made a function yet to print out these words, but I'd imagine the logcat to claim these words are pretty corrupted. Something for me to check later. As such, HK is not allowing me to use its features for the candidates strip, even when it shows by brute force, since it doesn't believe I have a dictionary to use the strip with

I am aware that it should still show the list of symbols at the top of the candidates strip. I haven't fully figured why it won't work for that either.

TPS commented 6 months ago

From https://github.com/klausw/hackerskeyboard/issues/901#issuecomment-2087869198

This would be a lot easier if I had a dictionary that worked.

Would having the original HK dictionaries help? Some are still on GPlay, I use another (which WFM), but (I assume) @KlausW might have the complete set someplace? 🙇🏾‍♂️

SeventhM commented 6 months ago

Having the source code or at least a source on how the main.dict file in the apk I do have was created would be helpful. Having the original dictionaries I suspect wouldn't change anything without knowing the intended structure of the dictionaries themselves

klausw commented 6 months ago

For building the dictionaries, I had found an old mailing list comment, quoted in https://github.com/klausw/hackerskeyboard/issues/939#issuecomment-1995383954

I've finally dug up the dictionary building tool, turns out it was already compiled as part of my Android AOSP directory. Just for future reference, the tool's source is in packages/inputmethods/LatinIME/tools/, and the binary is in out/host/linux-x86/bin/makedict . makedict -s Dicts-src/dict-src-uk.xml -d Dicts/main-uk.dict

For a sample input XML file, see https://github.com/klausw/hackerskeyboard/blob/287291e300a930fc6283eac875bf72c6347c3882/dictionaries/sample.xml#L2

<!-- This is a sample wordlist that can be converted to a binary dictionary
     for use by the Latin IME.
     The format of the word list is a flat list of word entries.
     Each entry has a frequency between 255 and 0.
     Highest frequency words get more weight in the prediction algorithm.
     You can capitalize words that must always be capitalized, such as "January".
     You can have a capitalized and a non-capitalized word as separate entries,
     such as "robin" and "Robin".
-->
<wordlist>
  <w f="255">this</w>
  <w f="255">is</w>
  <w f="128">sample</w>
  <w f="1">wordlist</w>
</wordlist>

The f= number is the relative word frequency, from 255 (most frequent) to 0 (least frequent).

klausw commented 6 months ago

I'm having a hard time finding the exact version of makedict that I used back then - I have files datestamped around Nov 14 2011, and it's probably approximately this: https://android.googlesource.com/platform/packages/inputmethods/LatinIME/+/refs/heads/ics-plus-aosp/tools/makedict/

I still have the shell script and jar file locally, let me know if you want me to send you a copy.

SeventhM commented 6 months ago

I still have the shell script and jar file locally, let me know if you want me to send you a copy.

That would be great, thanks

For a sample input XML file, see

Yeah, I passed by that when I was looking around, and once I plan to move the dictionary into the project itself, somewhere around there would probably be where I look to place dictionaries unless I find any other issues. Still given the intermediary step of converting to a dict file, I figured knowing that wouldn't be good enough

klausw commented 6 months ago

makedict uploaded here: https://drive.google.com/drive/folders/1tHl2989lk7DrobuTQonkmVfQqPsl1uq6?usp=sharing

The wrapper shell script doesn't seem necessary - a simple java -jar makedict.jar shows the usage message:

Usage: makedict [-s <unigrams.xml> [-b <bigrams.xml>] | -s <binary input>]  [-d <binary output>] [-x <xml output>] [-2]

  Converts a source dictionary file to one or several outputs.
  Source can be an XML file, with an optional XML bigrams file, or a
  binary dictionary file.
  Both binary and XML outputs are supported. Both can be output at
  the same time but outputting several files of the same type is not
  supported.
Exception in thread "main" java.lang.RuntimeException: No input file specified
SeventhM commented 6 months ago

Ok, Going through that code. Interesting stuff, some stuff is obviously not update to date with current java practices, some of it is just plain weird (there's a missing function in there with fully made JDocs, there's a lot of null = foo style checks when foo == null is more standard, etc.). That will definitely help me at most brute force my way through and at the least line up the dictionary code

That said, there are some aspects of it that I wish they did better. E.g., I can't tell if the dictionary is just broken or you didn't use this version. I say that because it doesn't recognize the dictionary as a binary dictionary. And instead of figuring this out via an error or by looking at the file extension or something, it figures it out by... not having a matching first 2 bytes and never checking anything else in the file. They don't even provide a separate argument for binary files.

I got the whole thing converted to Kotlin, so I'm this close to seeing how to plug in the stuff to gradle and forcibly have a fixed version of the script that at least tries (and probably fail at) brute force checking the file. But this is probably not worth my time. Now... where can I find a good word list... I hope converting Anysoft's into an xml file isn't too annoying

TPS commented 6 months ago

Now… where can I find a good word list.…

@SeventhM https://github.com/kkrypt0nn/wordlists ? 🙇🏾‍♂️

SeventhM commented 5 months ago

Closing as of https://github.com/SeventhM/hackerskeyboard/commit/ecaf6ee1eb2561217a18d6ddae9c8f509ba7062 Main observations I've been making

  1. I have not done a deep dive into the dictionary yet to update the wordlist. That said, now that I have the code, focusing too much on that has now become low priority
  2. I was strolling through the codebase trying to find where the example dictionary was and ran into this file. I wonder if this is the example dictionary. replacing it with the one from the app I have does have my logs say I have a functioning dictionary, which implies this may be the case. If so, it means my fork simply isn't seeing the dictionaries of other apps. I'm not sure if that's a security change that's happened or a side effects of a code change. Still needs looking into, though I'll move that to a separate issue
  3. It turns out, the main reason it wasn't working was a change from calling/overriding onDraw(Canvas) to using a custom function to call draw(Canvas). The original reason this change was made was Studio was a combination of onDraw nor draw accepting a null for the canvas (requiring a custom function) and a mistake on my end that to mean moving everything to the custom function without still overriding onDraw. This has been fixed with the commit linked above
  4. It turns out, the fix from here is only half correct. There are few main things I've noticed from my testing
    1. That fix put the call to setCandidatesViewShown in both onStartInput and onCreateCandidatesView. This is unnecessary. From my testing, you could put the fix in either location and it seems to work. Going by google's recommendation, I'd say maybe onStartInput is where the fix should be
    2. The inputView never seems to be shown by the time this function rolls around. We do have a fix for that (directly calling our internal function directly), but it still seems worth noting
    3. Whether prediction is on never seems to be set by this point in setting up the keyboard. This means we always end up skipping over the fix unless we call super directly. Considering the internal function is so we can set up the candidates correctly ahead of time, This seems like an issue. None of this information isn't set up until onStartInputView is done... which funnily enough does call setCandidatesViewShownInternal but seemingly isn't relevant when we need it to be
SeventhM commented 5 months ago

https://developer.android.com/training/package-visibility

Turns out, yes, I cannot see the dictionaries of other apps. This is a security change for apps targeting newer versions. I can work around this, but I'd need to find the package names of all of the various dictionaries that was created to do so. I can see why other dictionaries stopped offering it separately now