UstadMobile / Codec2-Android

Android library with JNI wrapper for Codec2. Uses Gradle and NDK to cross compile Codec2 (v0.8) into an AAR library for Android
GNU Lesser General Public License v3.0
19 stars 1 forks source link

Codec 2 700C encode #1

Open shlomi-commcrete opened 1 year ago

shlomi-commcrete commented 1 year ago

Hi, Im trying to encode a stream of input data that i get from AudioRecord and encode it to 700C.

Im getting 320 Bytes of speach frames into 4 bytes.

However the output isn't similar to the output that https://github.com/drowe67/codec2 gets.

Does anyone else has that problem / can help me?

snachmsm commented 9 months ago

any progress with that? I've tried implementing encode and decode using Codec2 class, without using built-in Codec2Decoder, and no luck... more info in question on SO

@mikedawson maybe there is a chance for some simple sample or Codec2Encoder class accompanying existing decoder?

mikedawson commented 9 months ago

Sorry it has been a while since I have worked on this as you can see and I haven't been able to maintain this one actively.

I think this might be related to the header. I vaguely remember this issue when testing different sample sizes but I think it got lost at the time. I remember something like this : everything was fine until the frame size is changed (e.g. by a change bitrate).

In the samples that were included here (e.g. in the assets), the header and the frame size were the same (or multiples), so the result was missing a frame or two, which was not noticeable.

When the header size and frame size were not multiples of each other, then the frame data is essentially out of alignment, and the result is then garbage output.

Unfortunately fixing it probably requires digging into the implementation, connection to the codec2 library, and some understanding of codec2 itself. I can provide hints and advise from what I can remember, and I can check / merge any pull request to fix it, but unfortunately at the moment I don't have the time to resolve it directly. It's something that I might get to work on again in the next months, but not in the short term.

My apologies for the issue and that I can't directly resolve it.

mikedawson commented 9 months ago

It seems like this library was used here in codec2_talkie here:

https://github.com/sh123/codec2_talkie/tree/master/libcodec2-android

Maybe the change list there (or the maintainers there) resolved this issue. I think it makes sense to have a libcodec2 library that its own module not tied to any particular app.

snachmsm commented 9 months ago

thanks for suggestions and reply

I don't have any file in my scenario, so I'm not sure that "header case" is for me... I'm recording and playing using Android classes: AudioTrack and AudioRecord, 16bit PCM, mono, 8000 etc. basics. I'm keeping encoded data in char array got from recordData which is inserted straight into playData (methods in question on SO)

I've tried to decode my encoded data with your decoder, VERY no luck, garbage... it plays sample file fine, so I'm suspecting that sooner my encoding side is somehow "wrong" than decoding

BUT: as "mine" encoding and "mine" decoding are cooperating partially (voice is hardly "understandable", far from sample files) then I'm suspecting both my sides implemented in a wrong way, maybe with same/shared mistake... Am I loosing some data when transforming char/byte/short array? or maybe I'm messing/not setting little/big endian...? literally no clue..

the crucial question to you is: have ever ENcoded raw audio/pcm to c2 format using your implementation? as you have decoder and no encoder... maybe you just needed one direction only and never tested second one?

I've tried to build your project, but it needs Linux env... codec2_talkie won't even start build in my env, "too new" Java installed... and also I'm far from beeing an expert in native libs topic. Thus I admit currently I'm using maven's ready-to-go aar (second snapshot). This would me take days for setting up env/VM, installing proper/old versions of accompanying soft and finally trying to build and produce aar, so I want to be sure that your codec has a flaw (well, half isn't working..) and isn't "reusable" in current form ;)

thats very good catch that codec2_talkie have some improving commits, didn't noticed. if I will be forced to building by own I will propably use their module. but yours looks so elegant and promissing...

thanks again for response and insights, looking forward for answer if you can confirm that encoding is/was/should working fine (and maybe still is in some soft still in use)

mikedawson commented 9 months ago

I think the answer is in the demo code:

https://github.com/UstadMobile/Codec2-Android/blob/master/appcodec2demo/src/main/java/com/ustadmobile/codec2/demo/MainActivity.java

InputStream is = asMgr.open("audio.c2");
is.skip(Codec2.CODEC2_FILE_HEADER_SIZE);
codec2 = new Codec2Decoder(is, Codec2.CODEC2_MODE_3200);

If the header is not skipped or read before using the Codec, then it reading frames will be out of alignment and the output will be garbled.

Also you must make sure that the mode passed to the decoder matches how the file was encoded.

I think I encoded the files using the compiled version of c2 on the command line. My use case for this (education videos in limited bandwidth environments) at the time only required playback on Android.

snachmsm commented 9 months ago

Your Codec2Decoder works fine with audio.c2, but doesn't work with my recordData() result. As encode method need char buffer (producing 8 bytes/"bits" from 160 samples always, ensured) I'm getting huge char array sum for few seconds, which I'm joining to String and then making a byteInputStream from it (thats Kotlin feature/extension function, not available in Java)

val myInputStream encodedBufferSum.joinToString { "" }.byteInputStream() codec2 = Codec2Decoder(myInputStream, Codec2.CODEC2_MODE_3200) // no "new" needed in Kotlin

Rest of code is exacly same, my data is a mess... should I remove also header from my raw output? its fixed 7, thus I should remove 7 bytes from my char array? it will "shuffle" chars in fact, as every have 2 bytes and I will remove odd number...

I've planned to use 700C, but for compatibility purposes (with your sample code and audio files) I'm testing with hardcoded CODEC2_MODE_3200 for both directions. I've tried other lower modes (well, not with file obviusly) and same results - mine decored is a bit understandable, but very poor, your decoder is producing only glitches

mikedawson commented 9 months ago

I think the reason char is used in these cases is because it provides a 16bit unsigned value (that seemed to be the norm for audio stuff on Android). It shouldn't be considered as an actual char or something that can be turned into a string.

My recommendation would be as follows:

1) Test the decoder against codec2 files generated by the official codec2 . Codec2 is available as an Ubuntu package (which should also be accessible via the Linux subsystem for Windows if needed). That will tell you if the decoder is working as expected. Now that I have seen the sample, I am fairly confident that the decoder was working with various bitrates, it just needs to be told what bitrate (CODEC2_MODE) to use.

2) If step 1 works as expected, then you know the issue is with your encoder. Don't use joinToString. Look at the decoder and see how it is handling char arrays. Whether you use Java or Kotlin shouldn't matter (all my recent code is Kotlin, but this isn't recent).

snachmsm commented 9 months ago

I've adjusted my player to your files, so that side is fine. Now both decoders work properly with files and I'm trying to adjust encoder

I've noticed that every encoded char in buffer is <=255, thus it can be converted to byte easily. so 2-byte-char becomes 1-byte-byte. after this change, instead of joinToString I've achieved hardly understandable audio with both decoders

and the real question is: what happens on JNI side in encode method? as decode below is pretty straightforward...

due to comment // Downsampling to F/2 looks like you are trying to downsample input, so should it be 16 khz? even if so I'm suspecting that further line short v = (short) jbuf[i * 2]; is just dropping "odd" samples instead of cutting "higher freqs" in every short (2 bytes). I'm suspecting that this line should be just short v = (short) jbuf[i] (or no for loop at all) and method feed with 8khz samples (same as output)

edit: looks like codec2_talkie found this issue and fixed only on own side without contribution... SOURCE

mikedawson commented 9 months ago

So does that mean the decoder is working with any file that you encode using c2enc on the command line? That's good to hear.

On the encode: I think this comes from the library that I forked it from. I didn't work on the encoding side. I think Android generally uses 16Khz, so the normal approach was to down sample by dropping odd samples.

You could try backporting the change from codec2_talkie and see if that resolves the issue.

It seemed to me like the native build approach in codec2_talkie was not using the recommended approach. I tried to structure the Gradle, C files etc as per the recommended structure. I found that made the cross compilation (e.g. for arm 64) relatively painless.

snachmsm commented 9 months ago

I can confirm that exchaning

short v = (short) jbuf[i * 2];

to

short v = (short) jbuf[i];

will fix an issue. I won't contribute, as I don't have linux for building and testing your project, tried VMs, but had a problems even running Ubuntu, too much work... It just my side project for fun, I just didn't believed that we can pack up speech into 700bits per sec, even 3200 is awesome result. And yes we can :)

How do I know that this is a real fix? because I'm producing 8khz sample array (yes, possible on Android), then adding it to new short array with additional 0x00 bytes on even positions. These bytes will be dropped by you on JNI side

for (i in 0 until framesNum) {
    val frameBuffer: ShortArray = recorderBuffer.sliceArray(i * samples until (i + 1) * samples)
    var bugAround = ShortArray(0)
    frameBuffer.forEach {
        bugAround += it // one short  single sample
        bugAround += 0x00
    }
    Codec2.encode(c2instance, bugAround/*frameBuffer */, encodedBuffer)

bugAround contains shorts in order sample0, 0x00, sample1, 0x00, sample2, 0x00... etc. twice sized as original recoded array. you will do "reverse-job" dropping these additional 0x00s. for now I'm using ready-to-go 64 bit AAR, so I have to stick to this workaround