espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.28k stars 7.2k forks source link

GCC does not seem to be optimising floating point code very well (IDFGH-11992) #13064

Open Barabas5532 opened 7 months ago

Barabas5532 commented 7 months ago

Answers checklist.

General issue report

Compile a function doings some floating point math:

void lfo_process(float *samples,
                 size_t sample_count,
                 float *lfo,
                 float *lfo_increment)
{
    float lfo_increment_tmp = *lfo_increment;
    float lfo_tmp = *lfo;
    for(int i = 0; i < sample_count; i++)
    {
        lfo_tmp += lfo_increment_tmp;

        if(lfo_tmp > 0.5f)
        {
            lfo_tmp = 0.5f;
            lfo_increment_tmp = -lfo_increment_tmp;
        }

        if(lfo_tmp < -0.5f)
        {
            lfo_tmp = -0.5f;
            lfo_increment_tmp = -lfo_increment_tmp;
        }
    }

    *lfo = lfo_tmp;
    *lfo_increment = lfo_increment_tmp;
}

The generated code has inefficient stuff like loading from main memory on every iteration rather than keeping things in registers:

        lfo_tmp += lfo_increment_tmp;
4008b0a8:       010713          lsi     f1, a7, 4
4008b0ab:       000703          lsi     f0, a7, 0
4008b0ae:       0a0100          add.s   f0, f1, f0
4008b0b1:       014703          ssi     f0, a7, 4

In an attempt to guide the compiler to a better solution, use the gcc register feature to place the temporaries in a specific floating point register:

    register float lfo_increment_tmp asm("f15") = *lfo_increment;
    register float lfo_tmp asm("f14") = *lfo;

Then the generated code is still not any faster in a benchmark, the loads are replaced by shuffling the value from f15 and f1, then doing the addition on f1, then moving the value back to f15. It is then immediately moved from f15 back to f0 for a comparison.

        lfo_tmp += lfo_increment_tmp;
4008b0a8:       fa1f00          mov.s   f1, f15
4008b0ab:       fa0e00          mov.s   f0, f14
4008b0ae:       0a0100          add.s   f0, f1, f0
4008b0b1:       faf000          mov.s   f15, f0

        if(lfo_tmp > 0.5f)
4008b0b4:       fa0f00          mov.s   f0, f15

It would be great if the compiler could do a better job here, so that we don't have to write so much code in assembly for DSP functions.

0xjakob commented 7 months ago

@Barabas5532 Could you please specify the ESP-IDF version, the chip and gcc version which you are using? Furthermore, what is the compiler call used to generate this code (you can use idf.py -v build to see all compiler calls)?

Lapshin commented 7 months ago

@Barabas5532, I see that the assembly you provided was generated without optimizations.

Please try compiling it with the -O1/-O2 flags to enable optimizations. (https://godbolt.org/z/enzdaa6rf)

Barabas5532 commented 7 months ago

Thanks for the hints on the optimisation flags. I see what went wrong now. When my component is being compiled, some compile flags that I though would be set up by esp-idf are missing.

I'm using a "Modern CMake" style for all of my components so that they can be cross platform to desktop for testing. This is documented in the manual.

It looks like the documentation is wrong though. In tools/cmake/component.cmake, a lot of magic happens to set up compile flags. This is all skipped for components defined using modern cmake.

Here are the compiler invocations for cJSON using the original 5.1.2 release, and then after modification as described in the docs. I've text replaced the actual path to esp-idf with $IDF_PATH so it's easier to read:

Compiler invocation for esp-idf v5.1.2 release ``` ccache /home/barabas/.espressif/tools/xtensa-esp32-elf/esp-12.2.0_20230208/xtensa-esp32-elf/bin/xtensa-esp32-elf-gcc -DESP_PLATFORM -DIDF_VER=\"v5.1.2\" -DSOC_MMU_PAGE_SIZE=CONFIG_MMU_PAGE_SIZE -D_GNU_SOURCE -D_POSIX_READER_WRITER_LOCKS -I$IDF_PATH/examples/protocols/http_server/restful_server/build/config -I$IDF_PATH/components/json/cJSON -I$IDF_PATH/components/newlib/platform_include -I$IDF_PATH/components/freertos/FreeRTOS-Kernel/include -I$IDF_PATH/components/freertos/FreeRTOS-Kernel/portable/xtensa/include -I$IDF_PATH/components/freertos/esp_additions/include/freertos -I$IDF_PATH/components/freertos/esp_additions/include -I$IDF_PATH/components/freertos/esp_additions/arch/xtensa/include -I$IDF_PATH/components/esp_hw_support/include -I$IDF_PATH/components/esp_hw_support/include/soc -I$IDF_PATH/components/esp_hw_support/include/soc/esp32 -I$IDF_PATH/components/esp_hw_support/port/esp32/. -I$IDF_PATH/components/esp_hw_support/port/esp32/private_include -I$IDF_PATH/components/heap/include -I$IDF_PATH/components/log/include -I$IDF_PATH/components/soc/include -I$IDF_PATH/components/soc/esp32 -I$IDF_PATH/components/soc/esp32/include -I$IDF_PATH/components/hal/esp32/include -I$IDF_PATH/components/hal/include -I$IDF_PATH/components/hal/platform_port/include -I$IDF_PATH/components/esp_rom/include -I$IDF_PATH/components/esp_rom/include/esp32 -I$IDF_PATH/components/esp_rom/esp32 -I$IDF_PATH/components/esp_common/include -I$IDF_PATH/components/esp_system/include -I$IDF_PATH/components/esp_system/port/soc -I$IDF_PATH/components/esp_system/port/include/private -I$IDF_PATH/components/xtensa/include -I$IDF_PATH/components/xtensa/esp32/include -I$IDF_PATH/components/lwip/include -I$IDF_PATH/components/lwip/include/apps -I$IDF_PATH/components/lwip/include/apps/sntp -I$IDF_PATH/components/lwip/lwip/src/include -I$IDF_PATH/components/lwip/port/include -I$IDF_PATH/components/lwip/port/freertos/include -I$IDF_PATH/components/lwip/port/esp32xx/include -I$IDF_PATH/components/lwip/port/esp32xx/include/arch -mlongcalls -Wno-frame-address -fdiagnostics-color=always -ffunction-sections -fdata-sections -Wall -Werror=all -Wno-error=unused-function -Wno-error=unused-variable -Wno-error=unused-but-set-variable -Wno-error=deprecated-declarations -Wextra -Wno-unused-parameter -Wno-sign-compare -Wno-enum-conversion -gdwarf-4 -ggdb -Og -fmacro-prefix-map=$IDF_PATH/examples/protocols/http_server/restful_server=. -fmacro-prefix-map=$IDF_PATH=/IDF -fstrict-volatile-bitfields -fno-jump-tables -fno-tree-switch-conversion -DconfigENABLE_FREERTOS_DEBUG_OCDAWARE=1 -std=gnu17 -Wno-old-style-declaration -MD -MT esp-idf/json/CMakeFiles/__idf_json.dir/cJSON/cJSON_Utils.c.obj -MF esp-idf/json/CMakeFiles/__idf_json.dir/cJSON/cJSON_Utils.c.obj.d -o esp-idf/json/CMakeFiles/__idf_json.dir/cJSON/cJSON_Utils.c.obj -c $IDF_PATH/components/json/cJSON/cJSON_Utils.c ```
Compiler invocation for custom version with cJSON defined as in the "pure cmake" docs ``` /home/barabas/.espressif/tools/xtensa-esp32-elf/esp-12.2.0_20230208/xtensa-esp32-elf/bin/xtensa-esp32-elf-gcc -I$IDF_PATH/examples/protocols/http_server/restful_server/modern_json/cJSON -mlongcalls -Wno-frame-address -fdiagnostics-color=always -o modern_json/CMakeFiles/modern_json.dir/cJSON/cJSON.c.obj -c $IDF_PATH/examples/protocols/http_server/restful_server/modern_json/cJSON/cJSON.c ```

I have pushed the modified esp-idf here. Did I make a mistake setting up the pure cmake component? It is missing a lot of compile flags, including the optimisation level, and -fno-jump-tables, which is a workaround to some compiler bugs if I understand correctly: https://github.com/espressif/esp-idf/issues/1552#issuecomment-883600363.

Barabas5532 commented 7 months ago

https://github.com/espressif/esp-idf/issues/10579 is related, but they never tried to work around the errors using add_subdirectory instead of putting the pure cmake files in the components folder. I guess I was not supposed to use the cmake API this way with esp-idf?

igrr commented 7 months ago

@Barabas5532 You can link pure CMake library targets to the application the way your example project does. However, since these targets are not added by the IDF build system, we don't add any compilation flags to such targets, except for the mandatory flags specified in the toolchain file.

If you describe the behavior you expect (both for a chip target and Linux target) I can try to recommend a way to achieve it.

Barabas5532 commented 7 months ago

I'm designing most of my components to have no esp-idf dependencies by using abstraction layers on top of esp-idf. I can then build the components for desktop for testing using mock or fake implementation of the abstractions. The main feature of the project is audio processing for digital guitar effects, and I want to be able to extract the core features (audio processing, parameters and a web API to control the parameters) and run it in a digital audio workstation without any hardware. You can think of like an interactive integration test running on the host.

So the goal for the build system is to allow building these components into a firmware based on ESP-IDF, or for a unit test based on some test framework, or into a VST plugin based on JUCE. This should all be pretty straight forward using "Modern CMake".

For the desktop platform build, the components should look like modern cmake components, and there should be no references to esp-idf.

For the ESP32 build, I would like the components to have the same default compiler flags as components defined when using idf_component_register. I assume all of these are there for a reason. At the very least, the optimisation level should be propagated from menuconfig. Perhaps -fno-exceptions and -fno-rtti as well, I'm not so familiar with the rest. So maybe just having a way to tell esp-idf to define the compile flags globally would be enough? e.g. by setting CMAKE_LANG_FLAGS or putting the default flags in the toolchain file.

I'm already linking a common compiler flags component to all of my components like this, maybe I can add the missing flags there when ESP_PLATFORM is set? It is based on this idea from JUCE: https://github.com/juce-framework/JUCE/blob/d054f0d14dcac387aebda44ce5d792b5e7a625b3/extras/Build/CMake/JUCEHelperTargets.cmake#L24. This is not the nicest solution, it is easy to forget to link to this target and still end up with missing flags.

I see some others have gone for making two parallel build setups for conventional cmake and esp-idf. I also started out with something like this, but I would like to avoid duplicating the configuration:

if(ESP_PLATFORM)
  idf_component_register(...)
else()
  add_library(...)
endif()