danielhrisca / asammdf

Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files
GNU Lesser General Public License v3.0
648 stars 226 forks source link

mdf_v4 `extract` function fails on specific CAN signals with `DataList` signal data block #464

Closed Jamezo97 closed 3 years ago

Jamezo97 commented 3 years ago

Python version

python=3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 14:00:49) [MSC v.1915 ' '64 bit (AMD64)] os=Windows-10-10.0.17763-SP0 numpy=1.18.1 asammdf=6.1.2

Code

MDF version

4.10

Code snippet

import asammdf
asammdf.MDF("MyFile.mf4") # I'll explain below

Traceback

It's always a different error for each measurement, at the moment we're observing

Exception has occurred: IndexError
boolean index did not match indexed array along dimension 0; dimension is 9387 but corresponding boolean dimension is 14999
File "C:\Users\james\.conda\envs\asammdf36\Lib\site-packages\asammdf\blocks\mdf_v4.py", line 9978, in _process_can_logging
  bus_data_bytes = data_bytes[idx_]
File "C:\Users\james\.conda\envs\asammdf36\Lib\site-packages\asammdf\blocks\mdf_v4.py", line 9829, in _process_bus_logging
  self._process_can_logging(index, group)
File "C:\Users\james\.conda\envs\asammdf36\Lib\site-packages\asammdf\blocks\mdf_v4.py", line 779, in _read
  self._process_bus_logging()
File "C:\Users\james\.conda\envs\asammdf36\Lib\site-packages\asammdf\blocks\mdf_v4.py", line 394, in __init__
  self._read(mapped=True)
File "C:\Users\james\.conda\envs\asammdf36\Lib\site-packages\asammdf\mdf.py", line 144, in __init__
  self._mdf = MDF4(name, channels=channels, **kwargs)
File "C:\p\scripts\test_asammdf.py", line 7, in <module>
  mdf = asammdf.MDF("MyFile.mf4")

However we've also had memory allocation related crashes in the past as Numpy attempts to allocate too much memory to fit an extremely large matrix. We tracked down where this issue occurs, but it appears to be a symptom of a larger issue. More detail below.

Description

Sorry, this is a long one.

We're using Vector CANape 17 to record vehicle signals for later playback and analysis. This include internal ECU signals over XCP and CAN bus logging.

The issue above, amongst others, always occurs during the _process_bus_logging method. In the past we've commented this out to get around it and the measurement seems fine otherwise.

We tracked the issue, well at least one of the symptoms of the underlying issue, down to the extract function. I've cleaned it up a bit so I can explain:

def extract(signal_data, is_byte_array):
    size = len(signal_data)
    values = []
    pos = 0

    while pos < size:
        (str_size,) = UINT32_uf(signal_data, pos)
        pos = pos + 4 + str_size
        values.append(signal_data[pos - str_size : pos])

    if is_byte_array:
        values = array(values)
        values = values.view(dtype=f"({values.itemsize},)u1")
    else:
        values = array(values)

    return values

Usually, the UINT32_uf function returns a size of 64. However sometimes when extracting CAN signals, this returns a pseudo random extremely large number. For this version of ASAMMDF, this causes the loop to finish early, as pos > size due to str_size being so large. Because it exit early, the resultant array has a size of 9387 along the first dimension instead of 14999, and causes the issue above.

In earlier versions of ASAMMDF, the loop would finish, and a numpy array was created, but instead of being 64 x num_samples in size, the very last, extremely large sample, caused it to try and allocate a, for example, 64 x 123456789 array. Python would then crash.

I've looked into the data it's trying to unpack, and turns out it's a block identifier! For the current example I'm looking at, it's ##SD

Using the Vector MDF Validator I can see that the signal_data bytes object came from a DataList block, which contains 2 SignalData blocks.

Github won't let me upload images at the moment, so I exported an XML view of the DataList block

<node name="Data List" type="DL4">
    <properties>
        <property description="Start Address">0x5798CF0</property>
        <property description="Block Identification">##DL</property>
        <property description="Reserved">6787016</property>
        <property description="Block Length">72</property>
        <property description="Number of links in link section">3</property>
        <property description="Address of next Data List block">0x0000</property>
        <property description="Address of data block (list element)">0x73E1A58</property>
        <property description="Address of data block (list element)">0xCFE9618</property>
        <property description="Flags">0</property>
        <property description="Reserved"><![CDATA[]]></property>
        <property description="Number of data blocks referenced by list">2</property>
        <property description="Offset">0</property>
        <property description="Offset">638248</property>
    </properties>
    <nodes>
        <node name="Signal Data" type="SD4">
            <properties>
                <property description="Start Address">0x73E1A58</property>
                <property description="Block Identification">##SD</property>
                <property description="Reserved">6787016</property>
                <property description="Block Length">638272</property>
                <property description="Number of links in link section">0</property>
                <property description="Data Section"><![CDATA[[bytes of data section]]]></property>
            </properties>
        <nodes />
        </node>
            <node name="Signal Data" type="SD4">
                <properties>
                    <property description="Start Address">0xCFE9618</property>
                    <property description="Block Identification">##SD</property>
                    <property description="Reserved">6787016</property>
                    <property description="Block Length">381708</property>
                    <property description="Number of links in link section">0</property>
                    <property description="Data Section"><![CDATA[[bytes of data section]]]></property>
                </properties>
            <nodes />
        </node>
    </nodes>
</node>

I know this signal has 14999 samples. Each sample has a 4 byte header, and 64 bytes of data. 14999 * (64+4) = 1019932 bytes.

But, according to the MF4, we have 638272 + 381708 = 1019980 bytes of data. Taking the difference, 1019980 - 1019932 = 48, exactly 2 * 24, i.e. exactly 2 Block headers.

So it seems, either 1) The MF4 file is being written incorrectly by CANape, the 'block length' value should be 24 bytes smaller 2) ASAMMDF is reading these ##SG blocks incorrectly and it should be truncating the last 24 bytes

I made a quick modification, in v4_blocks.py.

# Change from this
self.data = stream[address : address + self.block_len]

# To this
self.data = stream[address : address + self.block_len - COMMON_SIZE]

After making this change, the MF4 file loaded successfully. I would make a PR however I'm extremely apprehensive as I'm sure this is going to break a great number of other things.

Hence, here we are. Any opinions or thoughts on this issue? I don't have access to the MDF v4 specification so I can't look up exactly how this DataList block is supposed to work, but if anyone has it and can confirm what's going on here, it'd be much appreciated.

I'll try to reduce the MF4 file down in size a bit and remove sensitive data from it so I can share it here. But that's going to have to be a job for tomorrow.

danielhrisca commented 3 years ago

OMFG this bug has been there for ages, I can't believe I haven't bump into it so far!

I've pushed a fix: we were reading 24 bytes past the actual block size