atomic14 / ICS-43434-breakout-board

Sample code for the ICS-43434 breakout board and the ESP32
Creative Commons Zero v1.0 Universal
25 stars 9 forks source link

Request Directional Mic #5

Open StuartIanNaylor opened 3 years ago

StuartIanNaylor commented 3 years ago

Hey @cgreening thought I would ask as still being a noob not exactly sure where to start, so thought I would ask you again. :)

Looking at https://invensense.tdk.com/wp-content/uploads/2015/02/Microphone-Array-Beamforming.pdf and a stereo pair of INMP441 mems the board is 14mm so closest we can get is a 14mm mic spacing with those boards.

That gives a null frequency of 12Khz with a 2-sample delay @ 48Khz.

Not great as prob would like maybe make that 16Khz null but a 11m spacing is a bit hard but it will do just for a test. My noobness has me wondering how in the code do where the first mic is summed with an inverted and delayed signal from the rear microphone.

void I2SSampler::processI2SData(uint8_t *i2sData, size_t bytes_read)
{
    int32_t *samples = (int32_t *)i2sData;
    for (int i = 0; i < bytes_read / 4; i++)
    {
        // you may need to vary the >> 11 to fit your volume - ideally we'd have some kind of AGC here
        addSample(samples[i] >> 11);
    }
}

Its just 2 samples @ 14mm but guess in stereo the I2S data comes L/R...L/R but rather than L/R think of them as F/R until we get a Rear delay1 & delay2 = zero Rear delay2 is inverted and added to front1 and would you store this in a new raw stream? Rear_delay1 is moved to Rear_delay2 and current Rear is moved to Rear_delay1

I have been looking at the ADF and its still greek to me but in the pipeline then all is needed is to use the EQ to flatten the response as in Figure 12. Frequency Response of an Endfire Beamformer at Different Incident Angles.

I might have you in a unique position to offer boards with maybe 2/3 mics on a 11mm spacing :) But just checking with 2x @ 14mm and at the downsampling to 16Khz cuts out after the null notch?

Apols for hounding you about this but just really interested how much load this would entail and could you run it in conjunction with the KWS part of your Alexa project on different cores?

cgreening commented 3 years ago

Hey Stuart, I'll have to read and digest the PDF. If it's simply a case of subtracting the right from the left channel then that shouldn't add much load at all.

The code change would be fairly trivial - you'd need to change the setup of I2S to have both left and right channels and there might be a bit of work to handle under and overflow when doing the maths - maybe do the calculations in 64 bits and then clip if needed.

void I2SSampler::processI2SData(uint8_t *i2sData, size_t bytes_read)
{
    int32_t *samples = (int32_t *)i2sData;
    for (int i = 0; i < bytes_read / 4; i+=2)
    {
        int32_t difference = samples[i] - samples[i+1]
        // you may need to vary the >> 11 to fit your volume - ideally we'd have some kind of AGC here
        addSample(difference >> 11);
    }
}
StuartIanNaylor commented 3 years ago

Yeah I thought maybe keep it at 1st 24bit as that has loads of headroom as presume its held 32bit so math wise ok.

I thought it might make another great project idea sperate from this and also it wasn't a joke about boards as directional mems could be quite popular maybe, but lets check if how it works out.

Its invert one so -rear_delay2 + current_front that will give a directional mic but the eq does need to be flattened by 6db per octave as you go down from the null frequency. I guess you could run the EQ pipeline before or after and its just which is easier? So there is the EQ load, dunno!? how much that adds but at least its already done in the ADF.

There is also the resample pipeline where after 48khz can be dropped to 16khz and I mention as that would need to be after EQ as 16k is not a supported EQ rate.

I was always confused by addSample(difference >> 11); as my noob C seems to make me think that is 11bit rotate so we lose 5 LSBs. Which is no problem but just wondered why not addSample(difference >> 16); as with the originally 24bit source that is an extremely heaving 'compressor' style effect even 16 loses 8 LSB's.

The 'compressor' effect might be a bonus but been wondering if the resample pipeline should be used? But also it does show that a noise-gate / compressor parameters could be added at that point also with minimal effect as they are very much part of what is already being done? Compressor is very much just losing LSBs and noise gate is just omitting signals under a threshold completely, I think with my limited knowledge of audio engineering? :)

But please do have a read of https://invensense.tdk.com/wp-content/uploads/2015/02/Microphone-Array-Beamforming.pdf as a simple 2/3 mic beamformer could make a really interesting project that could be of much interest. A board with 3 mics on 7mm @48khz could be really interesting as the point at source is a viable type of far field use.

I can not work out if the resampler when converting to mono sums the input channels as maybe you could take the current stereo input stream and build a raw stream that places the inverted rear_delay2 at the current position front position but in the rear channel. If it does then its just a matter of calling the resampler to mono 16Khz.

So pipeline would be EQ -> RAW -> Resample?

Maybe creating the Mono Raw whilst invert summing would provide much less load as EQ could run after on a mono stream.

So pipeline would be RAW -> EQ -> Resample(16k)

But yeah really interested in what the load would be and what the ESP32 could do, but thinking its the flattening via EQ will cause most load and that the 16Khz voice SR is not a supported EQ sampling rate, so hence at the end of the pipeline. https://docs.espressif.com/projects/esp-adf/en/latest/api-reference/audio-processing/equalizer.html

cgreening commented 3 years ago

I was having a browser around - have you come across these boards?

https://www.espressif.com/sites/default/files/documentation/esp32_hardware_design_guidelines_en.pdf

5.1.2 ESP32-LyraTD-MSC Audio Development Board

StuartIanNaylor commented 3 years ago

Yeah but expensive (£45) and large as say compare to a wrover 8mb for £6 as even the sd card reader is essentially bloat even if cool to have. (on the wrover with the sd)

The chip that its based on is real interesting though https://www.microchipdirect.com/product/ZL38063LDG1 as @ £4 its relatively cheap. It can work with 2/3 mics and you can parallel to another for 4/6 mic configs the ESP32-LyraTD-MSC Audio has gone all in with 6 mics and its questionable as the 2/3 config probably is a better fit. We should be able to make a relatively cheap board and load up the firmware from a esp32 and that SD card might come in handy.

Also for Pi users as Microchip have a dev board and linux drivers and that has surprised me as the Pi has a screaming need for a Mic/Dac beamformer. That you could have a Hat with 2/3 and a daughter card addon to make that 4/6 that isn't vastly more expensive than a Pi3A+ in terms of AI proliferation it has a big market. Its surprised me Pi hasn't done something themselves but they are quite canny that way and doggedly stick to just Pi.

I have a feeling its the devkit costs of the ZL38063 and that it requires a custom kernel which will cause a whole load of S***.

I have mixed emotions about those all-in-one devkits as from mic placement, form factor, connectors and so choice of the designer is totally constricted. I have no idea why an all-in-one is provided and why the didn't do a 2/3 mic audio board with I2S SPI/I2C that has the mics on snap-offs/cut-outs if you so choose.

The ZL38063 look really interesting but after trying some commercial beamformers that work exceptionally well in industrial distributed noise fields in domestic scenarios with singular predominant noise especially TV voice but music media can completely bamboozle when predominant against far field.

I think the cost of multiple distributed KWS directional mics is comparative and in domestic situations as it partitions the high end dsp by simply using the nearest and best KW hit stream. I am a big fan of opensource and been screaming crazily for quite a while now that we should stop trying to copy the impossible of commercial style product but process local in single all in units whilst we can do something better with a system akin to pro conference distributed mic arrays at low cost that then feed the advantages of a singular HAL style central private ASR intent processor that can scale from embedded to high end GPU compute units. In a distributed mic array you negate far field problems as nearest is always picked. If we are to copy commercial then basically we make KW microphones but for some reason many opensource projects try to make what ends up as bad all-in-ones, whilst most of commercial processing is in the cloud. Currently we can make exceptionally good KWS mics with some form of indicator for about £10 or could if I can twist your arm to create the endfire beamformer repo :)

Another thing I found out recently is the SPH0645LM4H-B mics are 18bit but sample up to 64Khz which with endfires makes higher order more possible.

StuartIanNaylor commented 3 years ago

PS Chris are you asking about the ESP32-LyraTD-MSC because it states it is supported?

Like usual defined code of the HAL to make things easier often confuses or makes thing more diffficult. Its possible to define a custom HaL as basically you are just creating a template of I2S pins & SPL/I2C control parameters.

Being a noob I am not sure how you hack your own or even with the pipeline we will still have play on a unpopulated I2S interface output. But this might make more sense to you as you can override the defined ADF templates and create custom ones.

https://github.com/espressif/esp-adf/issues/327 https://esp32.com/viewtopic.php?t=13698

I will also have a look for an 3rd party equalizer as it would be far better to run @ 16Khz mono after the beamforming has taken place.

https://esp32.com/viewtopic.php?t=13532

I am not sure about the equalizer maybe it is using the codecs as not sure as its not opensource.

https://github.com/espressif/esp-adf-libs/blob/914f260647f123c18869f6693aeb79f803608e11/esp_codec/filter_resample.c

Its like https://github.com/phkehl/esp32-a1s-audio_hal that is a fork for the audio kit boards which are fairly cheap. I still can not work out if the EQ is closed source software or embedded into the codec but why add forks if so, so maybe software

The MSC for beamforming is definitely hardware and still trying to work out if https://github.com/espressif/esp-sr/tree/ and MASE is software or not, think again its just closed source.

I have been trying to set up the IDF with Eclipse or VS Code and have not been doing very well :)