espressif / esp-skainet

Espressif intelligent voice assistant
Other
533 stars 117 forks source link

Doubt regarding esp32s3-eye mic input (AIS-1531) #130

Open PrathamG opened 3 months ago

PrathamG commented 3 months ago

Hi! I just had a query regarding the bsp_get_feed_data function for esp32s3-eye board in the hardware driver.

From this line: tmp_buff[i] = tmp_buff[i] >> 14; // 32:8为有效位, 8:0为低8位, 全为0, AFE的输入为16位语音数据,拿29:13位是为了对语音信号放大。

I can assume that:

  1. Bits ix 0-7 are discarded because they are useless.
  2. Bits ix 8-13 are discarded as noise.
  3. Bits 14-29 remain in the word as our audio reading.
  4. Bits 30-31 remain in the second byte and are considered discarded.

So my query is, assuming that the I2S input is two's complement 24-bit left justified, wouldn't the bit 31 contain the MSB of the audio sample? If we just discard it, wouldn't it affect the sign of the audio sample? Sorry if I misunderstood something, thanks!

BlueSkyB commented 3 months ago

Because the data type is int32_t, in the C language, it will be an arithmetic right shift, so the sign bit will not be lost.

PrathamG commented 3 months ago

Yes, but after the right shift is complete, the MSB of the resulting value still ends up being discarded right? We have 18 bits of the original 32-bit result after the right shift removes the padding and the noise. From these only the lower 16 bits are used further and the MSB is discarded. Wouldn't this affect the sign of the resulting value?

BlueSkyB commented 3 months ago

After the right shift, the MSB is not discarded. After the right shift, the result is still assigned to the current variable, which is of type int32_t. You may need to learn more about pointers and right shift operations in C language.

PrathamG commented 3 months ago

Yeah, the result is still stored in the int32_t variable. But the first 2 bits are discarded when it's passed to AFE. For esp-eye, the AFE examples initialize with one mic and one reference channel, which means that only the first 16 bits is used as data for AFE applications and the second half is discarded (or used for aec).

PrathamG commented 3 months ago

Wouldn't this make more sense instead:

buffer[i] = buffer[i] >> 14;  
buffer[i] = (buffer[i] > INT16_MAX) ? INT16_MAX : (buffer[i] < -INT16_MAX) ? -INT16_MAX : buffer[i];