biemster / asr

Android offline speech recognition natively on PC
https://hackaday.io/project/164399-android-offline-speech-recognition-natively-on-pc
50 stars 8 forks source link

Reverse engineering using function hooks with frida #2

Open thebabush opened 4 years ago

thebabush commented 4 years ago

Need some help dumping runtime values from the actual Gboard app? It might be useful to have some intermediate wavs/data/results to check against.

I think it shouldn't be too difficult to do using frida.

biemster commented 4 years ago

That would be awesome actually, you think you can extract the data that goes into the neural nets, with a known wav? So just the first stage, where the audio is converted to input for the first encoder? If you have an idea how to do that that would be most appreciated!

thebabush commented 4 years ago

Mmmh, I think doing it with a known wav is going to require more work. The first step is to dump the I/O of the nets themselves and test that we get the same values.

If that works, it means that the only thing left to do is the preprocessing (well, and the FST thing). That's gonna require more time imho, because there's no clear I/O relationship between samples, so it makes sense to do it last.

Anyway, with things like frida, you can basically hook whatever java/native function and view/modify whatever you want. To dump NN I/O, it's "just" a matter of figuring out where the input tensor's buffer is stored and where the output one is... Then you hook the function and dump the buffers on call and return.

I can try to take a look when I have some time.

biemster commented 4 years ago

That would be most helpful. I've looked into running this GBoard on an emulated Pixel a couple months back to see the actual dictation is action, but without success. Didn't know about frida yet. Do you have a rooted pixel, or are you going the emulator way?

thebabush commented 4 years ago

Apart from the fact that it uses javascript, frida is amazing :) check it out.

Btw, I have a rooted Pixel, so it should Just Work™.

biemster commented 4 years ago

The new Recorder app for Pixel 4 devices uses a very similar, but probably updated model. That app might be easier to handle in frida, since it isn't a keyboard? Apk mirror has a download.

thebabush commented 4 years ago

Thanks. I'll look into it when I have some time. Anyway Gboard too shouldn't be that difficult to handle, and I like keeping a fixed target.

thebabush commented 4 years ago

Ok so I've made some progress. I now have a frida script that can dump tensorflow input buffers.

The sizes I'm getting are:

2 1,2
2 1,640
2 1,2
2 1,2
2 1,640
2 1,640
2 1,128

(1, 2) is 99% for the endpointer/noise level/whatever

I might have a different model than yours. I'll tell you later the exact version (I disabled the updates now). I might also have hooked only some special case as there is more than one way to reach the prediction stage.

It's not super difficult stuff, but it's tricky to read TF's structures straight from memory ;)

Btw are you still working on the initial filtering?

biemster commented 4 years ago

Awesome progress. The sizes of the buffers I already deduced from the tf models as written in the project logs, good to see you got the same numbers. I'm still working on the initial filtering and audio processing, yes.

thebabush commented 4 years ago

Yeah I wasn't implying it's new info ;) I found them from tf models too. It was just to confirm and update you on my (small) progress.

thebabush commented 4 years ago

Btw, what version of TF do I need to use to import the models? Your script on the endpointer gives me Encountered unresolved custom op: Relu1.Node number 2 (Relu1) failed to prepare..

thebabush commented 4 years ago

For future readers, pip install tensorflow==1.13.1.

thebabush commented 4 years ago

Figure_1

:)

rawdata *= 0.000030518 (aka, rawdata /= 32767.547021429975).

Sample rate 16k in this example.

biemster commented 4 years ago

Wow!! All you needed to do was normalize the raw data? Pfff how I missed that. So if the endpointer is working now, it should be a small step to feed the encoder with the right data. I'm looking forward to your further progress!

biemster commented 4 years ago

@pannous thanks for joining the discussion! What you propose is exactly what happens, except the sym_prob should not go into enc0. enc0 and enc1 together are the blue encoder from the graph, and the green Pred. Network is called 'dec' in the code. You can see this on line 100 in the dictation.py.

biemster commented 4 years ago

@thebabush you having any luck lately? I've been exploring a different angle, with great help from @theafien on hackaday. We are trying to load the jni library in java, so we have an easier environment to debug it, but not much luck yet. On the frida path, what would be of great help to me is the frequency of the calls to the different models, and the actual flow of info. Like is the whole chain of models kicked off by the endpointer, or are there also other triggers? Is the enc0 model run twice as often as enc1? Is the joint run as often as the enc1? Are any of the models run sometimes with the same inputs? or with half of the values equal to the previous run? Any such info is already very helpful, as it gives a lot of info how to connect the inputs with the outputs.

thebabush commented 4 years ago

Unfortunately I haven't had much time to work on this. Maybe in January I'll be able to find some time but I don't know. Anyway, these are a couple of frida scripts you can use as a base if you want to give it a try. Just download the arm server from their github, push it to the phone, disable SELinux and you are good to go. My scripts refer to the original gboard apk (if you need it I can try to find the APK and give you its hash). Anyway, I modified them along the way so they are not super useful as is.

frida.tar.gz

theafien commented 4 years ago

@thebabush what android version you are using?

Em qua., 4 de dez. de 2019 às 11:21, thebabush notifications@github.com escreveu:

Unfortunately I haven't had much time to work on this. Maybe in January I'll be able to find some time but I don't know. Anyway, these are a couple of frida scripts you can use as a base if you want to give it a try. Just download the arm server from their github, push it to the phone, disable SELinux and you are good to go. My scripts refer to the original gboard apk (if you need it I can try to find the APK and give you its hash). Anyway, I modified them along the way so they are not super useful as is.

frida.tar.gz https://github.com/biemster/asr/files/3922247/frida.tar.gz

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6RBYMU6X6RBVOCTAITQW64GFA5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF5FVEI#issuecomment-561666705, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6SBS5KKZLH2ZKWWUQTQW64GFANCNFSM4I74M6FQ .

-- Att, Gilson Fabiano (www.gilsonfabiano.com)

thebabush commented 4 years ago

I updated since then, but it was some version of Android 9. Why?

theafien commented 4 years ago

@thebabush, frida works fine in Android 9?

Em qua., 4 de dez. de 2019 às 11:29, thebabush notifications@github.com escreveu:

I updated since then, but it was some version of Android 9. Why?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6TH7VK5ZVNPYVLXZ4TQW65FZA5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF5GSPY#issuecomment-561670463, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6QUT563C3THELULSG3QW65FZANCNFSM4I74M6FQ .

-- Att, Gilson Fabiano (www.gilsonfabiano.com)

thebabush commented 4 years ago

Yup, I'm positive about that. I see they are working on fixing issues with Android 10... But you can always use is in an emulator. Actually, I would recommend that.

theafien commented 4 years ago

@thebabush i think that frida dont works in new versions of android. I'll test later it. Do you have the bytes of nativeInit and nativeRun parameters from the libgoogle_speech_jni.so? Do you can hook this functions?

Em qua., 4 de dez. de 2019 às 11:53, thebabush notifications@github.com escreveu:

Yup, I'm positive about that. I see they are working on fixing issues with Android 10... But you can always use is in an emulator. Actually, I would recommend that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6QNDFDICFHMPR25BWTQW6777A5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF5JC4Y#issuecomment-561680755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6SXR5AIELI7D4ILGM3QW6777ANCNFSM4I74M6FQ .

-- Att, Gilson Fabiano (www.gilsonfabiano.com)

thebabush commented 4 years ago

For run (actually, Invoke), yep it's there in my script. Unfortunately it seems I lost my idb file T.T so unless you use the exact same library version I was using (the first gboard model, I should have the apk somewhere).

Why do you need the bytes anyway? For the shapes? They are described in the config file. The contents are just a bunch of floats.

theafien commented 4 years ago

@thebabush, the parameters are a the protobuf, the config files dont have the .proto I need only the bytes to emulate without know the real parameters. Mainly the nativeRun data.

Em qua., 4 de dez. de 2019 às 14:30, thebabush notifications@github.com escreveu:

For run (actually, Invoke), yep it's there in my script. Unfortunately it seems I lost my idb file T.T so unless you use the exact same library version I was using (the first gboard model, I should have the apk somewhere).

Why do you need the bytes anyway? For the shapes? They are described in the config file. The contents are just a bunch of floats.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6WVU2XJWS5ILZ6IEB3QW7SM3A5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF53NAQ#issuecomment-561755778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6S37VE5OZUC5EENNWDQW7SM3ANCNFSM4I74M6FQ .

-- Att, Gilson Fabiano (www.gilsonfabiano.com)

thebabush commented 4 years ago

What real parameters? The inputs/outputs are just tensors and the whole library runs asynchronously so you can't easily match outputs with inputs to other blocks.

Honestly I think our best shot at emulating greco3 is to build something similar module by module and parse directly the configuration shipped with the apks / models.

theafien commented 4 years ago

@thebabush, im my project i can already receive recognizer events with the texts, but dont have the same behavior throughout the audio stream.

Em qua., 4 de dez. de 2019 às 14:49, thebabush notifications@github.com escreveu:

What real parameters? The inputs/outputs are just tensors and the whole library runs asynchronously so you can't easily match outputs with inputs to other blocks.

Honestly I think our best shot at emulating greco3 is to build something similar module by module and parse directly the configuration shipped with the apks / models.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6WSKWWHGVI2RL65RE3QW7UTXA5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF55E7I#issuecomment-561762941, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6QWXL5E4ZVYSGYSP3LQW7UTXANCNFSM4I74M6FQ .

-- Att, Gilson Fabiano (www.gilsonfabiano.com)

thebabush commented 4 years ago

Is you project pushed somewhere?

theafien commented 4 years ago

@thebabush I shared only with biemster. Do you have hackaday account?

Em qua., 4 de dez. de 2019 às 18:31, thebabush notifications@github.com escreveu:

Is you project pushed somewhere?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6RT6X2DXITEKHH4R53QXAORXA5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF6R4XQ#issuecomment-561847902, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6XPQW4I7AP6FIMWG6DQXAORXANCNFSM4I74M6FQ .

-- Att, Gilson Fabiano (www.gilsonfabiano.com)

thebabush commented 4 years ago

babush

vinnitu commented 4 years ago

Do you see https://github.com/kaldi-asr/kaldi ?

I read http://www.opennet.ru/opennews/art.shtml?num=52171 and download apk with offline model for russian language. And it works.

theafien commented 4 years ago

Victor, with google model?

Em seg, 20 de jan de 2020 11:01, Victor Sklyar notifications@github.com escreveu:

Do you see https://github.com/kaldi-asr/kaldi ?

I read http://www.opennet.ru/opennews/art.shtml?num=52171 and download apk with offline model for russian language. And it works.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6XQZQXZPE2HOXL4MKDQ6WVCPA5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJMXDBQ#issuecomment-576287110, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6RMMBTLFHZHVCF62B3Q6WVCPANCNFSM4I74M6FQ .

biemster commented 4 years ago

Thanks for your interest @mepihin819 ! Progress is a bit slow atm, but we are all still working on it.

biemster commented 4 years ago

Two distinct paths are tried atm: 1) already giving some results: import the .so lib from the recorder app in custom java code on android 2) most portable: completely reverse engineer the tensorflow models to import them in a custom tensorflow application

@mepihin819 which is closest to your expertise?

JudeAshly commented 4 years ago

Found a link, thought It might be helpful

https://github.com/noahchalifour/rnnt-speech-recognition

biemster commented 4 years ago

nice! ^^ looks promising

nlphacker commented 4 years ago

interesting project.

for callling the jni from the java, do you guys figure out the function parameters ?

it appears to me, it needs to initialize the resource manager and then init the recognizer.

AbstractRecognizer private native int nativeCancel(long j);

private native long nativeConstruct();

private native void nativeDelete(long j);

private native int nativeInitFromProto(long j, long j2, byte[] bArr);

private native byte[] nativeRun(long j, byte[] bArr);

ResourceManager public long f5799a = nativeConstruct();

private native long nativeConstruct();

private native void nativeDelete(long j);

private native int nativeInitFromProto(long j, byte[] bArr, String[] strArr);
nlphacker commented 4 years ago

seems these are internal APIs for a while.

https://github.com/zhuowei/Xenologer-src-glasshangouts/blob/master/smali/com/google/speech/recognizer/AbstractRecognizer.smali

nlphacker commented 4 years ago

perhaps need some debugging on the recorder app to find out the parameters.

nlphacker commented 4 years ago

@biemster for the portable approach, it is good.

I think i have an idea to help intercept the data when doing the reco.

create a tensorflow lite mocked .so, then dump the input parameters, and pass to the real tensorflow lite .so to process, the dump the output result.

you can then use this mocked so in recorder app to intercept all the data pass in and out in the whole reco session

have you tried such idea?

Two distinct paths are tried atm:

  1. already giving some results: import the .so lib from the recorder app in custom java code on android
  2. most portable: completely reverse engineer the tensorflow models to import them in a custom tensorflow application

@mepihin819 which is closest to your expertise?

biemster commented 4 years ago

@nlphacker: @theafien made a lot of progress in this direction, including finding function parameters for the native functions that work.

nlphacker commented 4 years ago

@nlphacker: @theafien made a lot of progress in this direction, including finding function parameters for the native functions that work.

is there a sample that works well

biemster commented 4 years ago

is there a sample that works well @theafien wrote it and had it working on his phone, but I'm not quite there yet..

nlphacker commented 4 years ago

@theafien could you share?

theafien commented 4 years ago

@nlphacker of course, do you have hackaday? add me theafien, and I send the project.

nlphacker commented 4 years ago

@theafien i added u on hackaday. can you share from there