downmix causing 6 dB gain (AUD-5200)

ftab commented 8 months ago

Environment

Audio development kit: custom board
[Required] Module or chip used: ESP32-WROVER-E
[Required] IDF version: slightly tweaked fork of v5.1.2 with adf patch applied and a minor fix for task stack in SPIRAM (https://github.com/radiosound-com/esp-idf/tree/v5.1.2-fix-task-stack-in-spiram)
[Required] ADF version: slightly tweaked fork of master somewhere between 2.5 and 2.6 (https://github.com/radiosound-com/esp-adf/tree/radiosound-modded-v2.5-66-g49e63cd5-idf-5.1.2)
Build system: idf.py
Compiler version: esp-12.2.0_20230208
Operating system: Linux
Using an IDE?: No
Power supply: external

Problem Description

I have added downmix to my pipeline to allow beep files stored in flash to be played alongside music

There appears to be 6 dB of gain introduced by the downmix element no matter what the gain settings are set to in the source info array, even in bypass mode. Both source streams are about twice as loud as they should be.

Expected Behavior

Downmixer should have 0 gain by default, audio level of music should remain the same whether downmix is in the pipeline or not.

Actual Behavior

6 dB of gain as measured by playing a 1khz tone file on both versions of the project and measuring with sound meter. I may be able to work around it by adding a -6 dB gain stage, but it causes a loss of some resolution, and that will adversely affect the product performance. I'd prefer not to.

Steps to Reproduce

None yet, I am trying to investigate esp-adf examples so that I can reproduce on ESP32-LyraT, but I am having trouble even playing a simple 1khz sine mp3 file from my SD card at the moment (choppy/wrong rate)

ftab commented 7 months ago

My current workaround is to right-shift the incoming audio sample by 1 bit, so the downmix output samples are doubled compared to the input for some reason.

Still can't get the LyraT to play a simple sine wave mp3 off the SD card for some reason (either in the downmix example or in the pipeline_sdcard_mp3_control example).

TempoTian commented 7 months ago

Please provide your downmix settings, seems that the input is 2 channels and output is 1 channel, if use this config the output will be increase 6db. If left and right channel play different content, it will sound and behavior normal

ftab commented 7 months ago

Here are the downmixer parts of my code put together in the order they are called.

    ESP_LOGI(TAG, "[4.0] Create pipeline_mix pipeline");
    audio_pipeline_cfg_t pipeline_cfg = DEFAULT_AUDIO_PIPELINE_CONFIG();
    pipeline_mix = audio_pipeline_init(&pipeline_cfg);

    ESP_LOGI(TAG, "[4.1] Create down-mixer element");
    downmix_cfg_t downmix_cfg = DEFAULT_DOWNMIX_CONFIG();
    downmix_cfg.downmix_info.source_num = 2;
    downmixer = downmix_init(&downmix_cfg);

    /* Set both streams to never time out */
    downmix_set_input_rb_timeout(downmixer, 0, 0);
    downmix_set_input_rb_timeout(downmixer, 0, 1);

    esp_downmix_input_info_t source_information[2] = {0};
    esp_downmix_input_info_t source_info_stream = {
        .samplerate = 44100, /* TODO: handle 48khz? */
        .channel = 2,
        .bits_num = 16,
        .gain = {0, -10},
        .transit_time = 50,
    };
    source_information[0] = source_info_stream;

    esp_downmix_input_info_t source_info_tone = {
        .samplerate = 44100,
        .channel = 2,
        .bits_num = 16,
        .gain = {0, 0},
        .transit_time = 500,
    };
    source_information[1] = source_info_tone;
    source_info_init(downmixer, source_information);
    audio_pipeline_register(pipeline_mix, downmixer, "mixer");

    ESP_LOGI(TAG, "[7] Link all pipeline elements together");

    audio_pipeline_link(pipeline_mix, (const char*[]){"mixer", "dsp_x", "i2s_w"}, 3);

    ringbuf_handle_t rb_bt = audio_element_get_input_ringbuf(bt_raw);
    downmix_set_input_rb(downmixer, rb_bt, 0);

    ringbuf_handle_t rb_tone = audio_element_get_input_ringbuf(tone_raw);
    downmix_set_input_rb(downmixer, rb_tone, 1);

    downmix_set_output_type(downmixer, ESP_DOWNMIX_OUTPUT_TYPE_TWO_CHANNEL);

    ESP_LOGI(TAG, "[8.3] Start audio_pipeline");
    audio_pipeline_run(pipeline_bt);
    audio_pipeline_run(pipeline_mix);
    downmix_set_work_mode(downmixer, ESP_DOWNMIX_WORK_MODE_BYPASS);

The total pipeline looks like this:

    bt -> resampler -> raw
                         \
                           ---> mixer -> dsp -> i2s
                         /
fatfs reader -> mp3 -> raw

The bluetooth player sets the timeout of its stream (index 0) to 200 while playing and 0 while suspended (just guessed random number and threw it in there)

Later when the tone file is played:

            audio_element_set_uri(tone_reader, filename);
            audio_pipeline_run(pipeline_tone);
            downmix_set_work_mode(downmixer, ESP_DOWNMIX_WORK_MODE_SWITCH_ON);
            downmix_set_input_rb_timeout(downmixer, 500, 1);

            /* Wait for the tone to finish playing */
            ESP_LOGI(TAG, "Waiting for pipeline finish event");
            ulTaskNotifyTake(pdFALSE, portMAX_DELAY);
            ESP_LOGI(TAG, "Finished");

            downmix_set_work_mode(downmixer, ESP_DOWNMIX_WORK_MODE_SWITCH_OFF);
            downmix_set_input_rb_timeout(downmixer, 0, 1);
            audio_pipeline_stop(pipeline_tone);
            audio_pipeline_wait_for_stop(pipeline_tone);
            audio_pipeline_terminate(pipeline_tone);
            audio_pipeline_reset_ringbuffer(pipeline_tone);
            audio_pipeline_reset_elements(pipeline_tone);

ftab commented 7 months ago

It would appear despite me setting ESP_DOWNMIX_OUT_CTX_LEFT_RIGHT and ESP_DOWNMIX_OUTPUT_TYPE_TWO_CHANNEL that it's mixing them together, as I no longer have L/R separation. All content now comes out of both channels.

ftab commented 7 months ago

Figured it out. I need ESP_DOWNMIX_OUT_CTX_NORMAL, not LEFT_RIGHT.

espressif / esp-adf