ARM-software / CMSIS-DSP

CMSIS-DSP embedded compute library for Cortex-M and Cortex-A
https://arm-software.github.io/CMSIS-DSP
Apache License 2.0
454 stars 122 forks source link

CMSIS-DSP fill flash with tables not in actual use #173

Closed ACleverDisguise closed 2 months ago

ACleverDisguise commented 2 months ago

I'm working with a chip that has very limited space: 32KB Flash, 4KB SRAM. When doing FFT on q15_t types, I can't finish linking because of excess constant data. Initially I used 256 samples, but tried 128 and even 64 for identical outcomes.

Inspecting the .map file there's a very clear reason why this is the case:

twiddleCoef_1024_q15         0x8e6c   0xc00  Data  Gb  CommonTables.o [3]
twiddleCoef_128_q15          0xc6fc   0x180  Data  Gb  CommonTables.o [3]
twiddleCoef_16_q15           0xcba8    0x30  Data  Gb  CommonTables.o [3]
twiddleCoef_2048_q15         0x4f80  0x1800  Data  Gb  CommonTables.o [3]
twiddleCoef_256_q15          0xb2f4   0x300  Data  Gb  CommonTables.o [3]
twiddleCoef_32_q15           0xcb0c    0x60  Data  Gb  CommonTables.o [3]
twiddleCoef_4096_q15            0x0  0x3000  Data  Gb  CommonTables.o [3]
twiddleCoef_512_q15          0xa22c   0x600  Data  Gb  CommonTables.o [3]
twiddleCoef_64_q15           0xc95c    0xc0  Data  Gb  CommonTables.o [3]

A single one of these twiddle coefficient tables alone is eating 12KB of Flash space by itself. CommonTables.o in aggregate is taking up 40KB. Of 32.

    CommonTables.o                    40'664
    ComplexMathFunctions.o       64
    FastMathFunctions.o         384
    TransformFunctions.o      5'994
    -------------------------------------------------
    Total:                    6'442   40'664

Looking into CommonTables.c (and its included files) I don't see any way to pare back the contents. Despite me never using anything over 256 samples in FFT operations, my map file is full of references to as high as 4096. For example:

armBitRevIndexTable_fixed_1024
                             0x9a6c   0x7c0  Data  Gb  CommonTables.o [3]
armBitRevIndexTable_fixed_128
                             0xc87c    0xe0  Data  Gb  CommonTables.o [3]
armBitRevIndexTable_fixed_16
                             0xce20    0x18  Data  Gb  CommonTables.o [3]
armBitRevIndexTable_fixed_2048
                             0x7eec   0xf80  Data  Gb  CommonTables.o [3]
armBitRevIndexTable_fixed_256
                             0xc51c   0x1e0  Data  Gb  CommonTables.o [3]
armBitRevIndexTable_fixed_32
                             0xcbd8    0x30  Data  Gb  CommonTables.o [3]
armBitRevIndexTable_fixed_4096
                             0x3000  0x1f80  Data  Gb  CommonTables.o [3]
armBitRevIndexTable_fixed_512
                             0xaf34   0x3c0  Data  Gb  CommonTables.o [3]
armBitRevIndexTable_fixed_64
                             0xca9c    0x70  Data  Gb  CommonTables.o [3]

And there doesn't appear to be any way to turn this all off.

What am I missing? (Or, rather, perhaps more accurately, what is the documentation missing?)

christophe0606 commented 2 months ago

@ACleverDisguise You need to avoid using initialization functions like arm_cfft_init_f32 and instead use dedicated versions for the FFT lengths you need. For instance arm_cfft_init_512_f32

There are similar functions for other data types, and other transforms (rfft ...)

Like that, the linker will be able to deduce the table that are not used and they won't be included in the final build.

Those functions are documented in the doxygen generated documentation.

There is also a new section in the README : https://github.com/ARM-software/CMSIS-DSP?tab=readme-ov-file#code-size

(The required build and link options that are compiler dependent must also be used : like -ffunction-sections, -fdata-sections, -Wl,--gc-sections ...)

ACleverDisguise commented 2 months ago

I'm calling arm_cfft_init_256_q15().

It's still bringing in every size of every static variable.

christophe0606 commented 2 months ago

What's your compiler and command line options for compiling and linking ?

ACleverDisguise commented 2 months ago

IAR and I have no idea: the IDE does its level best to conceal that from me. I'll go digging to see what command line it comes up with.

ACleverDisguise commented 2 months ago

Ah, never mind. A third-party (as in another coworker's) file had a call for the general init and I wasn't expecting it to have any calls to the math lib at all, not to mention FFT.

Stripping that out and replacing it with the appropriately-sized init (64 in this case) solved our problem. Thanks, @christophe0606!

ACleverDisguise commented 2 months ago

Problem solved, but this really needs to be documented in the actual library docs, not buried in a README. Like perhaps arm_cfft_init_<type>()'s documentation should mention the sized variants?

Or optionally initialization could be a macro call that parametrizes by size and type like this: ARM_CFFT_INIT(&my_init_struct, q15, 256) and have that assemble the call to arm_cfft_init_256_q15(&my_init_struct)?

A macro I've (loosely) tested that does this is below:

#define ARM_CFFT_INIT_IMPL(S, T, N) arm_cfft_init ##N ##T(S)
#define ARM_CFFT_INIT_PREP(S, T, N) ARM_CFFT_INIT_IMPL(S, _ ##T, _ ##N)
#define ARM_CFFT_INIT(S, T, N) ARM_CFFT_INIT_PREP(S, T, N)

This permits me to write code like this:

#define DATA_TYPE q15
#define SAMPLE_SIZE 256
ARM_CFFT_INIT(&my_struct, DATA_TYPE, SAMPLE_SIZE)

And if I decide I want to expand or contract my sample size, I change it in one place and all uses are updated. Similarly if I decide I want to move to f32, I change it in one place and it gets updated everywhere.

christophe0606 commented 2 months ago

@ACleverDisguise We will upgrade the documentation so that it is more visible. The macro is a good suggestion. Thank you.

ACleverDisguise commented 2 months ago

I've written now about a dozen such macros for the features I use. They've really helped in rapid changes while testing and prototyping. I'm sure if I spent actual time on it, I'd make macros that sucked slightly less. ;)