danielhrisca / asammdf

Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files
GNU Lesser General Public License v3.0
633 stars 224 forks source link

MemoryError when opening large mf4 file on Intel CPU but not on AMD CPU #941

Closed haydersaad closed 10 months ago

haydersaad commented 11 months ago

Python version

('python=3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:01:55) [MSC v.1900 32 ' 'bit (Intel)]') 'os=Windows-10-10.0.17763-SP0' 'numpy=1.21.6' 'asammdf=5.21.0'

Code

MDF version

MDF4.10

Code snippet

 with MDF(directory, memory="low") as mdf:
    logging.info("Split by CPU sets")
    all_channels = list(mdf.channels_db)

    re_filter = re.compile("test/.*")
    test_signals = list(filter(re_filter.match, all_channels))

    # split by CPU
    for cpu_set in CPUS:
        logging.info("Creating split file with these cpus: %s", str(cpu_set))
        channels = []
        name = "r" + enum
        forcpu in cpu_set:
            if cpu == "test":
                channels += test_signals
            name = name + "_" + cpu
        output_path = directory.parent / FOLDER_NAME_BY_CPU / name
        new_mf4 = mdf.filter(list(channels))
        new_mf4.save(output_path, overwrite=True)
        new_mf4.close()

Traceback

Exception in thread spliting test_file_2023: Traceback (most recent call last): File "C:\Python37\lib\threading.py", line 926, in _bootstrap_inner self.run() File "C:\Python37\lib\threading.py", line 870, in run self._target(*self._args, self._kwargs) File "C:\workspace\MDF_Conversion\db_mdfconvert\lib\site-packages\mf4_tools\split_megaputer_mf4s.py", line 45, in split_mf4 with MDF(directory, memory="low") as mdf: File "C:\workspace\MDF_Conversion\db_mdfconvert\lib\site-packages\asammdf\mdf.py", line 121, in init self._mdf = MDF4(name, kwargs) File "C:\workspace\MDF_Conversion\db_mdfconvert\lib\site-packages\asammdf\blocks\mdf_v4.py", line 387, in init self._read(mapped=False) File "C:\workspace\MDF_Conversion\db_mdfconvert\lib\site-packages\asammdf\blocks\mdf_v4.py", line 759, in _read self._sort() File "C:\workspace\MDF_Conversion\db_mdfconvert\lib\site-packages\asammdf\blocks\mdf_v4.py", line 9387, in _sort new_data = rem + read(dtblock_size) MemoryError

Description

I am trying to "split" a large mf4 file(>15GB's) into smaller mf4 files containing specific signals based on a filter. (for example, one such filter is all signals with test in their name, etc). My code works perfectly when I am using an AMD machine but does keeps giving above error on a intel machine. AMD machine uses AMD EPYC 7763 64-core-processor (2.45GHz CPU) where as intel machine is using Intel(R) Xeon(R) Platinum 8268 (2.90GHz CPU) I can not see any other differences between the machines, both have 16GB RAM and >500GB of disk memory.

Would you have any idea regarding why I'm getting a memory error for this specific case? I would be open to provide more information if it can assist anyhow.

Thank you very much for your time!

haydersaad commented 10 months ago

My apologies, the PC which kept getting memory error was on 32 bit python. Upgraded to 64 bit python and everything works as expected.