Wrong output when running dsps_mulc_s16_ functions (DSP-108)

guilhAbreu commented 1 year ago

Environment

I am running this library from the latest arduino-esp32 package release (2.0.9). Since I'm assuming the issue is not directly related to arduino-esp32 package, I'm opening the issue here. I'm sorry if this is a mistake.

I'm compiling the project using the latest arduino-cli release as well.

Development Kit: [none]
Module or chip used: [ESP32S3]
Arduino-cli: [0.33.0]
Operating System: [Linux]
Power Supply: [USB]

Problem Description

I was trying to use the dsps_mulc functions to multiply an array by a constant, as described in its documentation, but I realized that the output for the int16_t version of the function is either [-1,-1,-1,...] or [0,0,0,...]. However, the float version works very well, always providing the expected output.

Example:

#include <esp_dsp.h>

const int16_t input[4] = {5,2,8,9};
int16_t output[4];
const int16_t myConstant = -5;
esp_err_t dsp_error;

dsp_error = dsps_mulc_s16_ae32(input, output, 4, myConstant, 1, 1);
//same happens with the ansi version;

Expected Behavior

dsp_error = 0; Output: [-25, -10, -40, -45]

Actual Behavior

dsp_error = 0; Output: [-1, -1, -1, -1]

Steps to reproduce

Compiling with the following options and executing the above piece of code should reproduce the bug.

_id=esp32s3 bootloader.tool=esptool_py bootloader.tool.default=esptool_py build.arch=ESP32 build.board=ESP32S3_DEV build.boot=qio build.boot_freq=80m build.bootloader_addr=0x0 build.cdc_on_boot=1 build.code_debug=4 build.copy_jtag_files=0 build.core=esp32 build.core.path= build.custom_bootloader=bootloader build.custom_partitions=partitions build.defines= build.dfu_on_boot=0 build.event_core=-DARDUINO_EVENT_RUNNING_CORE=1 build.extra_flags=-DESP32 -DCORE_DEBUG_LEVEL=4 -DARDUINO_RUNNING_CORE=1 -DARDUINO_EVENT_RUNNING_CORE=1 -DARDUINO_USB_MODE=1 -DARDUINO_USB_CDC_ON_BOOT=1 -DARDUINO_USB_MSC_ON_BOOT=0 -DARDUINO_USB_DFU_ON_BOOT=0 build.extra_flags.esp32=-DARDUINO_USB_CDC_ON_BOOT=0 build.extra_flags.esp32c3=-DARDUINO_USB_MODE=1 -DARDUINO_USB_CDC_ON_BOOT=1 build.extra_flags.esp32s2=-DARDUINO_USB_MODE=0 -DARDUINO_USB_CDC_ON_BOOT=1 -DARDUINO_USB_MSC_ON_BOOT=0 -DARDUINO_USB_DFU_ON_BOOT=0 build.extra_flags.esp32s3=-DARDUINO_USB_MODE=1 -DARDUINO_USB_CDC_ON_BOOT=1 -DARDUINO_USB_MSC_ON_BOOT=0 -DARDUINO_USB_DFU_ON_BOOT=0 build.extra_libs= build.f_cpu=240000000L build.flash_freq=80m build.flash_mode=dio build.flash_size=8MB build.fqbn=esp32:esp32:esp32s3:PartitionScheme=8MBCustom,FlashMode=qio,EraseFlash=all,CDCOnBoot=cdc,UploadMode=cdc build.library_discovery_phase=0 build.loop_core=-DARDUINO_RUNNING_CORE=1 build.mcu=esp32s3 build.memory_type=qio_qspi build.msc_on_boot=0 build.openocdscript=esp32s3-builtin.cfg build.openocdscript.esp32=esp32-wrover-kit-3.3v.cfg build.openocdscript.esp32c3=esp32c3-builtin.cfg build.openocdscript.esp32s2=esp32s2-kaluga-1.cfg build.openocdscript.esp32s3=esp32s3-builtin.cfg build.opt.name=build_opt.h build.opt.path= build.partitions=8Mpartition build.path= build.project_name=test_bench.ino build.psram_type=qspi build.source.path= build.system.path= build.tarch=xtensa build.target=esp32s3 build.usb_mode=1 build.variant=esp32s3 build.variant.path=

The code was compiled using the following command line: arduino-cli compile -b esp32:esp32:esp32s3 --board-options "PartitionScheme=8MBCustom,FlashMode=qio,EraseFlash=all,CDCOnBoot=cdc,UploadMode=cdc" --build-property "compiler.optimization_flags=-O2" --build-property "build.flash_size=8MB" --build-property "build.partitions=8Mpartition" --build-property "build.code_debug=4" --output-dir source/build/bpi-board/output --log-format text --log-file source/build/temp/build.log --log-level warn --verbose

Other items

elf-file.zip

peter-marcisovsky commented 1 year ago

Hello, the problem here is data format. In the esp-dsp repository, all the _s16_ functions use Q15 fixed point data format. In your code, you are feeding the function with incorrect inputs, which causes an output overflow, and this is the reason why you are getting -1 output for your input.

Peter

guilhAbreu commented 1 year ago

@peter-marcisovsky Wouldn't be the case to implement the shift parameter as it is implemented for dsps_add_s16?

guilhAbreu commented 1 year ago

@peter-marcisovsky Thank you for your time, and I'm sorry to continue to bother you. I took a look at both ANSI and ae32 implementations, and simply just removing the shift instruction solved the problem. It would be very helpful if the shift parameter could be implemented.

For those who are facing the same issue. I'm talking about lines 41 and 24 of the ae32 and ANSI implementation, respectively. I'm appending here the custom implementations that solve the problem for me.

#include "dsps_mulc_platform.h"
#if (dsps_mulc_s16_ae32_enabled == 1)

    .text
    .align  4
    .global dsps_mulc_s16_ae32_custom
    .type   dsps_mulc_s16_ae32_custom,@function

dsps_mulc_s16_ae32_custom: 
// input    - a2
// output   - a3
// len      - a4
// C        - a5
// step_in  - a6
// step_out - a7
// shift    - stack (a8)

    entry   a1, 16

    l32i.n  a8, a1,  16     // Load shift to the a8 register
    ssr     a8              // sar = a8

    srli    a4, a4, 1   // a4 = a4>>1
    slli    a6, a6, 2   // a6 - step_in<<3, because we load two inputs per loop
    slli    a7, a7, 1   // a7 - step_out<<2

    addi    a6, a6, -4;
    addi    a2, a2, -4;

    ldinc m0, a2

    loopnez a4, loop_end_mulc_f32_ae32
        add.n       a2, a2, a6     // input+=step_input;
        mul.DA.LL   m0, a5 
        rsr a8, acchi
        rsr a9, acclo
        src a8, a8, a9  // Here result in a8
        s16i    a8, a3, 0   // store result to the putput        
        // rsr a9, acclo
        // s16i a9, a3, 0   // store result to the putput        
        add.n   a3, a3, a7     // output+=step_out;
        mul.DA.HL   m0, a5         

        rsr a8, acchi
        rsr a9, acclo
        ldinc       m0,   a2               // load next data
        src a10, a8, a9  // Here result in a8
        s16i    a10, a3, 0   // store result to the putput
        // // rsr a9, acclo
        // // s16i  a9, a3, 0   // store result to the putput        
        add.n   a3, a3, a7  // output+=step_out;
loop_end_mulc_f32_ae32:
    movi.n  a2, 0 // return status ESP_OK
    retw.n

#endif // dsps_mulc_s16_ae32_enabled

esp_err_t dsps_mulc_s16_ansi_custom(const int16_t *input, int16_t *output, int len, int16_t C, int step_in, int step_out, int shift)
{
    if (NULL == input) return ESP_ERR_DSP_PARAM_OUTOFRANGE;
    if (NULL == output) return ESP_ERR_DSP_PARAM_OUTOFRANGE;

    for (int i = 0 ; i < len ; i++) {
        int32_t acc = (int32_t)input[i * step_in] * (int32_t)C;
        output[i * step_out] = (int16_t)(acc>>shift);
    }
    return ESP_OK;
}

espressif / esp-dsp