DanielT / a2ltool

A tool to edit, merge and update a2l files
Apache License 2.0
46 stars 15 forks source link

Performance of `insert_many` #18

Closed oleid closed 8 months ago

oleid commented 8 months ago

When inserting into an empty a2l file, a2ltool takes a very long time to perform the task.

I recorded a flamegraph using cargo flamegraph:

cargo flamegraph  --bin=a2ltool -o baseline.svg --  \
    a2ltool --create --elffile path/to/filename.out  --characteristic-regex "Settings.*"  --output test.a2l

Please find the svg file attached. Interactive viewing works fine in e.g. firefox:

baseline

Apparently, lots of time is spent inside the hash table and creating strings. Due to the string creation, I was wondering if jemalloc had any impact. But its flamegraph is very similar and I stopped a2ltool after > 60 Minutes, so I cannot say if it would have been any faster.

baseline-jemalloc

DanielT commented 8 months ago

[Note: I saw and removed a stray debug print in insert_characteristic_sym, which I accidentally checked in for 1.6.0 - oops. The following measurement was done after fixing that]

This is a very surprising result, I frequently use a2ltool to insert lots of items. It's usually almost instant. I just tried inserting ".*" from a fairly large elf file (~130Mb). With a release build this took 2 seconds and inserted about 180k items. With the debug build, this command completed in 11 seconds.

The only explanation I can think of to explain your case is combinatorial explosion. In order to compare every item against the regex, the string of every item must be constructed. If you have nested arrays, this could very quickly result in lots of items, e.g.: Var [1000][1000][1000] would result in 1 billion strings, one for each combination of [x][y][z]

My assumption was that the usual embedded controller would run out of memory long before this becomes a problem. After all, you would need 1GB of memory just to store Var[1000][1000][1000].

oleid commented 8 months ago

Thanks for your input. My elf file is indeed a smaller one from a microcontroller. So I maybe hit some endless loop. I'll report back when I found the time to debug this further.

oleid commented 8 months ago

It would seem I have a variable which is also an array, whose length cannot get extracted. Thus, the corresponding type info is missing in the iterator, resulting in an infinite loop. I created a pull request which fixes the issue.