danielhrisca / asammdf

Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files
GNU Lesser General Public License v3.0
611 stars 216 forks source link

dl_data data block addresses in saved MDF file corrupted by .select() method if .select() is called prior to .save() #1033

Open DT-one opened 3 weeks ago

DT-one commented 3 weeks ago

Hi

I am using ASAMMDF in a project where is serves as a bit of a datastore.

I am having a strange issue where the MDF object and subsequently the saved MDF file is corrupted by the .select() method.

My workflow is I generate an MDF object from some data read in from a few large CSV files, then pass that MDF object into another function that reads some signals out of the MDF object, performs some processing, and appends new signals into the same MDF object that is passed out of the function. After this process, an MDF file is saved and I have noticed the file is corrupt when reading it in MATLAB. Vector MDF Validator flags a bunch of nullptr addresses that are zero, for dl_data[x] fields in the DataList object. (I believe the file needs to be above the 4mb split limit for this to exist)

Through extensive testing this afternoon I have isolated the issue to the MDF.select() method. If this is called before the file is saved, the resulting file is corrupted. I have a few environments for different purposes, and this happens with all versions on asammdf I have installed and on all python versions. I updated to the latest version in one environment to test that.

Interestingly, if the file is saved before MDF.select() is called, the corruption does not occur. I only found this out while writing the demonstration script.

I hope we can find what this issue is as it causes file corruption, was difficult to find/debug and comes after relatively normal usage of the library.

No crash occurs nor are there any warnings.

Also interesting is that MDF.get() does not produce this corruption. (in my code I need to use .select() to recover conversion rule information)

The code snippet creates 3 files (that are too big to attach, even when compressed) 'Test_Presave_NoError.mf4' - File saved prior to calling MDF.select() 'Test_PostSave_NoError.mf4' - File saved after to calling MDF.select(), but also after MDF.save() 'Test_NoPreSave_ERROR.mf4' - File saved after to calling MDF.select(), with no MDF.save()

The error is only visible in the last file.

Test_PreSave_No_Error Test_PostSave_No_Error Test_NoPreSave_ERROR

Requested console output

('python=3.12.0 | packaged by Anaconda, Inc. | (main, Oct  2 2023, 17:20:38) '
 '[MSC v.1916 64 bit (AMD64)]')
'os=Windows-10-10.0.19045-SP0'
'numpy=1.26.2'
ldf is not supported
xls is not supported
xlsx is not supported
yaml is not supported
'asammdf=7.4.2'

MDF version

 4.10

Code snippet

# -*- coding: utf-8 -*-
"""
Created on Wed Jun  5 18:11:41 2024

@author: DT_one
"""

import numpy as np
from asammdf import MDF, Signal

mdf = MDF(version='4.10')

sigs = [Signal(samples = np.random.rand(600000),
               timestamps = np.arange(0,600000),
               name = f'Signal_{x}') for x in range(1,11)]
mdf.append(sigs,
           common_timebase = True)

sigs = [Signal(samples = np.random.rand(500000),
               timestamps = np.arange(0,500000),
               name = f'Signal_{x}') for x in range(11,21)]
mdf.append(sigs,
           common_timebase = True)

sigs = [Signal(samples = np.random.rand(500000),
               timestamps = np.arange(0,500000),
               name = f'Signal_{x}') for x in range(21,31)]
mdf.append(sigs,
           common_timebase = True)

mdf.save('Test_Presave_NoError.mf4',
          overwrite = True)

loc1 = mdf.whereis('Signal_15')
mdf.select([(None,) + loc1[0]])

mdf.save('Test_PostSave_NoError.mf4',
         overwrite = True)

mdf = MDF(version='4.10')

sigs = [Signal(samples = np.random.rand(600000),
               timestamps = np.arange(0,600000),
               name = f'Signal_{x}') for x in range(1,11)]
mdf.append(sigs,
           common_timebase = True)

sigs = [Signal(samples = np.random.rand(500000),
               timestamps = np.arange(0,500000),
               name = f'Signal_{x}') for x in range(11,21)]
mdf.append(sigs,
           common_timebase = True)

sigs = [Signal(samples = np.random.rand(500000),
               timestamps = np.arange(0,500000),
               name = f'Signal_{x}') for x in range(21,31)]
mdf.append(sigs,
           common_timebase = True)

# mdf.save('Test_Presave_NoError.mf4',
#           overwrite = True)

loc1 = mdf.whereis('Signal_15')
mdf.select([(None,) + loc1[0]])

mdf.save('Test_NoPreSave_ERROR.mf4',
         overwrite = True)

Also quickly adding asammdf is a great package, many thanks to all involved

danielhrisca commented 2 weeks ago

@DT-one please try the development branch code

DT-one commented 1 week ago

Sorry, I will test this. I am just having trouble finding time at the moment. Reading the commit, I am confident.