danielhrisca / asammdf

Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files
GNU Lesser General Public License v3.0
612 stars 216 forks source link

Performance benchmark #954

Open ratal opened 7 months ago

ratal commented 7 months ago

Hi Daniel, I would be curious to have comparison from your benchmark environment with the following: https://github.com/ratal/mdfr

danielhrisca commented 7 months ago

Hello Aymeric, I will have a look in the next days

danielhrisca commented 6 months ago

Benchmark environment

Files used for benchmark:

Open file Time [ms] RAM [MB]
asammdf 7.4.0.dev9 mdfv3 358 221
mdrf 0.4.1 mdfv3 250 202
asammdf 7.4.0.dev9 mdfv4 455 234
mdrf 0.4.1 mdfv4 225 247
Save file Time [ms] RAM [MB]
asammdf 7.4.0.dev9 mdfv3 361 381
mdrf 0.4.1 mdfv3 275 336
asammdf 7.4.0.dev9 mdfv4 898 400
mdrf 0.4.1 mdfv4 126 328
Get all channels (36424 calls) Time [ms] RAM [MB]
asammdf 7.4.0.dev9 mdfv3 1923 383
mdrf 0.4.1 mdfv3 0 209
asammdf 7.4.0.dev9 mdfv4 3934 399
mdrf 0.4.1 mdfv4 0 256
danielhrisca commented 6 months ago

I guess in mdfr all the data is loaded into the RAM when the file is opened

ratal commented 6 months ago

Thanks for investigating Daniel. API is different from mdfreader. What you have might be only for metadata parsing ? To load data in memory, it is needed to use load_channels_data_in_memory(channel_name) or load_all_channels_data_in_memory(). From my estimations, performance should be similar or worse than asammdf ; there is room for improvement, not yet really optimised. For instance choice of arrow2 and polars is not really assumed yet. Also, I think performance should come on long term from processing with polars : target use case is again more onto big data.