Open thebabush opened 4 years ago
That would be awesome actually, you think you can extract the data that goes into the neural nets, with a known wav? So just the first stage, where the audio is converted to input for the first encoder? If you have an idea how to do that that would be most appreciated!
Mmmh, I think doing it with a known wav is going to require more work. The first step is to dump the I/O of the nets themselves and test that we get the same values.
If that works, it means that the only thing left to do is the preprocessing (well, and the FST thing). That's gonna require more time imho, because there's no clear I/O relationship between samples, so it makes sense to do it last.
Anyway, with things like frida, you can basically hook whatever java/native function and view/modify whatever you want. To dump NN I/O, it's "just" a matter of figuring out where the input tensor's buffer is stored and where the output one is... Then you hook the function and dump the buffers on call and return.
I can try to take a look when I have some time.
That would be most helpful. I've looked into running this GBoard on an emulated Pixel a couple months back to see the actual dictation is action, but without success. Didn't know about frida yet. Do you have a rooted pixel, or are you going the emulator way?
Apart from the fact that it uses javascript, frida is amazing :) check it out.
Btw, I have a rooted Pixel, so it should Just Work™.
The new Recorder app for Pixel 4 devices uses a very similar, but probably updated model. That app might be easier to handle in frida, since it isn't a keyboard? Apk mirror has a download.
Thanks. I'll look into it when I have some time. Anyway Gboard too shouldn't be that difficult to handle, and I like keeping a fixed target.
Ok so I've made some progress. I now have a frida script that can dump tensorflow input buffers.
The sizes I'm getting are:
2 1,2
2 1,640
2 1,2
2 1,2
2 1,640
2 1,640
2 1,128
(1, 2)
is 99% for the endpointer/noise level/whatever
I might have a different model than yours. I'll tell you later the exact version (I disabled the updates now). I might also have hooked only some special case as there is more than one way to reach the prediction stage.
It's not super difficult stuff, but it's tricky to read TF's structures straight from memory ;)
Btw are you still working on the initial filtering?
Awesome progress. The sizes of the buffers I already deduced from the tf models as written in the project logs, good to see you got the same numbers. I'm still working on the initial filtering and audio processing, yes.
Yeah I wasn't implying it's new info ;) I found them from tf models too. It was just to confirm and update you on my (small) progress.
Btw, what version of TF do I need to use to import the models? Your script on the endpointer gives me Encountered unresolved custom op: Relu1.Node number 2 (Relu1) failed to prepare.
.
For future readers, pip install tensorflow==1.13.1
.
:)
rawdata *= 0.000030518
(aka, rawdata /= 32767.547021429975
).
Sample rate 16k in this example.
Wow!! All you needed to do was normalize the raw data? Pfff how I missed that. So if the endpointer is working now, it should be a small step to feed the encoder with the right data. I'm looking forward to your further progress!
@pannous thanks for joining the discussion! What you propose is exactly what happens, except the sym_prob should not go into enc0. enc0 and enc1 together are the blue encoder from the graph, and the green Pred. Network is called 'dec' in the code. You can see this on line 100 in the dictation.py.
@thebabush you having any luck lately? I've been exploring a different angle, with great help from @theafien on hackaday. We are trying to load the jni library in java, so we have an easier environment to debug it, but not much luck yet. On the frida path, what would be of great help to me is the frequency of the calls to the different models, and the actual flow of info. Like is the whole chain of models kicked off by the endpointer, or are there also other triggers? Is the enc0 model run twice as often as enc1? Is the joint run as often as the enc1? Are any of the models run sometimes with the same inputs? or with half of the values equal to the previous run? Any such info is already very helpful, as it gives a lot of info how to connect the inputs with the outputs.
Unfortunately I haven't had much time to work on this. Maybe in January I'll be able to find some time but I don't know. Anyway, these are a couple of frida scripts you can use as a base if you want to give it a try. Just download the arm server from their github, push it to the phone, disable SELinux and you are good to go. My scripts refer to the original gboard apk (if you need it I can try to find the APK and give you its hash). Anyway, I modified them along the way so they are not super useful as is.
@thebabush what android version you are using?
Em qua., 4 de dez. de 2019 às 11:21, thebabush notifications@github.com escreveu:
Unfortunately I haven't had much time to work on this. Maybe in January I'll be able to find some time but I don't know. Anyway, these are a couple of frida scripts you can use as a base if you want to give it a try. Just download the arm server from their github, push it to the phone, disable SELinux and you are good to go. My scripts refer to the original gboard apk (if you need it I can try to find the APK and give you its hash). Anyway, I modified them along the way so they are not super useful as is.
frida.tar.gz https://github.com/biemster/asr/files/3922247/frida.tar.gz
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6RBYMU6X6RBVOCTAITQW64GFA5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF5FVEI#issuecomment-561666705, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6SBS5KKZLH2ZKWWUQTQW64GFANCNFSM4I74M6FQ .
-- Att, Gilson Fabiano (www.gilsonfabiano.com)
I updated since then, but it was some version of Android 9. Why?
@thebabush, frida works fine in Android 9?
Em qua., 4 de dez. de 2019 às 11:29, thebabush notifications@github.com escreveu:
I updated since then, but it was some version of Android 9. Why?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6TH7VK5ZVNPYVLXZ4TQW65FZA5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF5GSPY#issuecomment-561670463, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6QUT563C3THELULSG3QW65FZANCNFSM4I74M6FQ .
-- Att, Gilson Fabiano (www.gilsonfabiano.com)
Yup, I'm positive about that. I see they are working on fixing issues with Android 10... But you can always use is in an emulator. Actually, I would recommend that.
@thebabush i think that frida dont works in new versions of android. I'll test later it. Do you have the bytes of nativeInit and nativeRun parameters from the libgoogle_speech_jni.so? Do you can hook this functions?
Em qua., 4 de dez. de 2019 às 11:53, thebabush notifications@github.com escreveu:
Yup, I'm positive about that. I see they are working on fixing issues with Android 10... But you can always use is in an emulator. Actually, I would recommend that.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6QNDFDICFHMPR25BWTQW6777A5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF5JC4Y#issuecomment-561680755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6SXR5AIELI7D4ILGM3QW6777ANCNFSM4I74M6FQ .
-- Att, Gilson Fabiano (www.gilsonfabiano.com)
For run (actually, Invoke), yep it's there in my script. Unfortunately it seems I lost my idb file T.T so unless you use the exact same library version I was using (the first gboard model, I should have the apk somewhere).
Why do you need the bytes anyway? For the shapes? They are described in the config file. The contents are just a bunch of floats.
@thebabush, the parameters are a the protobuf, the config files dont have the .proto I need only the bytes to emulate without know the real parameters. Mainly the nativeRun data.
Em qua., 4 de dez. de 2019 às 14:30, thebabush notifications@github.com escreveu:
For run (actually, Invoke), yep it's there in my script. Unfortunately it seems I lost my idb file T.T so unless you use the exact same library version I was using (the first gboard model, I should have the apk somewhere).
Why do you need the bytes anyway? For the shapes? They are described in the config file. The contents are just a bunch of floats.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6WVU2XJWS5ILZ6IEB3QW7SM3A5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF53NAQ#issuecomment-561755778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6S37VE5OZUC5EENNWDQW7SM3ANCNFSM4I74M6FQ .
-- Att, Gilson Fabiano (www.gilsonfabiano.com)
What real parameters? The inputs/outputs are just tensors and the whole library runs asynchronously so you can't easily match outputs with inputs to other blocks.
Honestly I think our best shot at emulating greco3 is to build something similar module by module and parse directly the configuration shipped with the apks / models.
@thebabush, im my project i can already receive recognizer events with the texts, but dont have the same behavior throughout the audio stream.
Em qua., 4 de dez. de 2019 às 14:49, thebabush notifications@github.com escreveu:
What real parameters? The inputs/outputs are just tensors and the whole library runs asynchronously so you can't easily match outputs with inputs to other blocks.
Honestly I think our best shot at emulating greco3 is to build something similar module by module and parse directly the configuration shipped with the apks / models.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6WSKWWHGVI2RL65RE3QW7UTXA5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF55E7I#issuecomment-561762941, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6QWXL5E4ZVYSGYSP3LQW7UTXANCNFSM4I74M6FQ .
-- Att, Gilson Fabiano (www.gilsonfabiano.com)
Is you project pushed somewhere?
@thebabush I shared only with biemster. Do you have hackaday account?
Em qua., 4 de dez. de 2019 às 18:31, thebabush notifications@github.com escreveu:
Is you project pushed somewhere?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6RT6X2DXITEKHH4R53QXAORXA5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF6R4XQ#issuecomment-561847902, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6XPQW4I7AP6FIMWG6DQXAORXANCNFSM4I74M6FQ .
-- Att, Gilson Fabiano (www.gilsonfabiano.com)
babush
Do you see https://github.com/kaldi-asr/kaldi ?
I read http://www.opennet.ru/opennews/art.shtml?num=52171 and download apk with offline model for russian language. And it works.
Victor, with google model?
Em seg, 20 de jan de 2020 11:01, Victor Sklyar notifications@github.com escreveu:
Do you see https://github.com/kaldi-asr/kaldi ?
I read http://www.opennet.ru/opennews/art.shtml?num=52171 and download apk with offline model for russian language. And it works.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biemster/asr/issues/2?email_source=notifications&email_token=AEN4K6XQZQXZPE2HOXL4MKDQ6WVCPA5CNFSM4I74M6F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJMXDBQ#issuecomment-576287110, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN4K6RMMBTLFHZHVCF62B3Q6WVCPANCNFSM4I74M6FQ .
Thanks for your interest @mepihin819 ! Progress is a bit slow atm, but we are all still working on it.
Two distinct paths are tried atm: 1) already giving some results: import the .so lib from the recorder app in custom java code on android 2) most portable: completely reverse engineer the tensorflow models to import them in a custom tensorflow application
@mepihin819 which is closest to your expertise?
Found a link, thought It might be helpful
nice! ^^ looks promising
interesting project.
for callling the jni from the java, do you guys figure out the function parameters ?
it appears to me, it needs to initialize the resource manager and then init the recognizer.
AbstractRecognizer private native int nativeCancel(long j);
private native long nativeConstruct();
private native void nativeDelete(long j);
private native int nativeInitFromProto(long j, long j2, byte[] bArr);
private native byte[] nativeRun(long j, byte[] bArr);
ResourceManager public long f5799a = nativeConstruct();
private native long nativeConstruct();
private native void nativeDelete(long j);
private native int nativeInitFromProto(long j, byte[] bArr, String[] strArr);
seems these are internal APIs for a while.
perhaps need some debugging on the recorder app to find out the parameters.
@biemster for the portable approach, it is good.
I think i have an idea to help intercept the data when doing the reco.
create a tensorflow lite mocked .so, then dump the input parameters, and pass to the real tensorflow lite .so to process, the dump the output result.
you can then use this mocked so in recorder app to intercept all the data pass in and out in the whole reco session
have you tried such idea?
Two distinct paths are tried atm:
- already giving some results: import the .so lib from the recorder app in custom java code on android
- most portable: completely reverse engineer the tensorflow models to import them in a custom tensorflow application
@mepihin819 which is closest to your expertise?
@nlphacker: @theafien made a lot of progress in this direction, including finding function parameters for the native functions that work.
@nlphacker: @theafien made a lot of progress in this direction, including finding function parameters for the native functions that work.
is there a sample that works well
is there a sample that works well @theafien wrote it and had it working on his phone, but I'm not quite there yet..
@theafien could you share?
@nlphacker of course, do you have hackaday? add me theafien, and I send the project.
@theafien i added u on hackaday. can you share from there
Need some help dumping runtime values from the actual Gboard app? It might be useful to have some intermediate wavs/data/results to check against.
I think it shouldn't be too difficult to do using frida.