Performance benchmark - Githubissues

ratal commented 7 months ago

Hi Daniel, I would be curious to have comparison from your benchmark environment with the following: https://github.com/ratal/mdfr

danielhrisca commented 7 months ago

Hello Aymeric, I will have a look in the next days

danielhrisca commented 6 months ago

Benchmark environment

3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]
Windows-10-10.0.22621-SP0
AMD64 Family 25 Model 80 Stepping 0, AuthenticAMD
numpy 1.23.1
15GB installed RAM

Files used for benchmark:

mdf version 3.10
- 167 MB file size
- 183 groups
- 36424 channels
mdf version 4.00
- 183 MB file size
- 183 groups
- 36424 channels

Open file	Time [ms]	RAM [MB]
asammdf 7.4.0.dev9 mdfv3	358	221
mdrf 0.4.1 mdfv3	250	202
asammdf 7.4.0.dev9 mdfv4	455	234
mdrf 0.4.1 mdfv4	225	247

Save file	Time [ms]	RAM [MB]
asammdf 7.4.0.dev9 mdfv3	361	381
mdrf 0.4.1 mdfv3	275	336
asammdf 7.4.0.dev9 mdfv4	898	400
mdrf 0.4.1 mdfv4	126	328

Get all channels (36424 calls)	Time [ms]	RAM [MB]
asammdf 7.4.0.dev9 mdfv3	1923	383
mdrf 0.4.1 mdfv3	0	209
asammdf 7.4.0.dev9 mdfv4	3934	399
mdrf 0.4.1 mdfv4	0	256

danielhrisca commented 6 months ago

I guess in mdfr all the data is loaded into the RAM when the file is opened

ratal commented 6 months ago

Thanks for investigating Daniel. API is different from mdfreader. What you have might be only for metadata parsing ? To load data in memory, it is needed to use load_channels_data_in_memory(channel_name) or load_all_channels_data_in_memory(). From my estimations, performance should be similar or worse than asammdf ; there is room for improvement, not yet really optimised. For instance choice of arrow2 and polars is not really assumed yet. Also, I think performance should come on long term from processing with polars : target use case is again more onto big data.

danielhrisca / asammdf

Performance benchmark #954