LaurentBerder commented 2 years ago

Python version

python=3.8.10 (default, Feb 23 2022, 16:08:11) \n[GCC 7.5.0]
os=Linux-3.10.0-862.11.6.el7.x86_64-x86_64-with-glibc2.2.5
numpy=1.21.0
asammdf=7.0.2

Code

MDF version

4.11

Code snippet

mf4_local_files = ['can/00.mf4', 'can/01.mf4', 'can/02.mf4']
dbc = {'CAN': ['dbc/00.dbc', 'dbc/01.dbc']}
for mf4 in range(len(mf4_local_files):
    # read the file
    mdf = MDF(mf4_local_files[mf4])
    # Convert with DBC
    globals()[f"decoded_{mf4}"] = mdf.extract_bus_logging(dbc)
    # Concatenate decoded files together
    if mf4 == 0:
        mdf_scaled = MDF.concatenate([globals()[f"decoded_{mf4}"]])
    else:
        mdf_scaled = MDF.concatenate([mdf_scaled, globals()[f"decoded_{mf4}"]])

Traceback

MdfException: internal structure of file <__new__.mf4> is different; different channel groups count

Description

The error happens between the first and second MF4 files. Indeed, when I look at the details of the decoded files, the number of channels is not the same: len(decoded_0.channels_db.keys()): 215, len(decoded_1.channels_db.keys()): 211

I used to not have any problems reading these files, because I didn't follow the steps in the same order. What I used to do was the following:

mdf = MDF.concatenate(mf4_local_files)
mdf_scaled = mdf.extract_bus_logging(dbc)

That would return a 215 channels, with potential missing data in them for files that did not originally contain them.

I have changed the order of steps, because this process would crash when reading large MF4 files (several 2GB files), and I figured filtering on the channels I use only would solve the memory issue (which it did, but introduced this new problem).

How could I find a way to avoid the Exception?

danielhrisca commented 2 years ago

I would rather save each decoded file to a temporary folder and feed the file names to the concatenate method

LaurentBerder commented 2 years ago

Thanks for coming back to me.

You mean like the following?

for mf4 inrange(len(mf4_local_files):
    # read the file
    mdf = MDF(mf4_local_files[mf4])
    # Convert with DBC
    scaled = mdf.extract_bus_logging(dbc)
    # Save temporary scaled file
    scaled.save(os.path.join(os.getcwd() + "/tmp/scaled/" + str(mf4) + ".mf4"))

# Concatenate all decoded files
mdf_scaled = MDF.concatenate(sorted([f for f in
                    glob(os.getcwd() + "/tmp/scaled/*")
                    if (".mf4" in f)])

I still get the same exception MdfException: internal structure of file <.../tmp/scaled/1.mf4> is different; different channel groups count

LaurentBerder commented 2 years ago

I tried going through dataframes

extract = pd.DataFrame()

for mf4 in range(len(mf4_local_files)):
    # Read file
    mdf = MDF(sorted(mf4_local_files)[mf4])
    # Convert with DBC
    scaled = mdf.extract_bus_logging(dbc)
    # Extract to dataframe, only keep required columns and add timestamp column
    df = scaled.to_dataframe(reduce_memory_usage=True, ignore_value2text_conversions=True)
    df = df[[c for c in df.columns if c in columns_to_keep]]
    df['timestamp'] = df.index
    df = df.reset_index(drop = True)
    df['source_file'] = mf4_local_files[mf4].split('/')[-1]
    # Concatenate dataframes
    extract = extract.append(df, ignore_index = True)

But it doesn't feel native, its pretty slow, and I notice the timestamps are not continuous (they start over from zero at each file?).

LaurentBerder commented 2 years ago

I corrected the issue with the non-continuous timestamps by adding cumulating them file after file

for mf4 in range(len(mf4_local_files)):
    # Read raw file
    mdf = MDF(sorted(mf4_local_files)[mf4])
    # Convert it with DBC
    scaled = mdf.extract_bus_logging(dbc)
    # Extract to dataframe, only keep required columns and add timestamp column
    df = scaled.to_dataframe(reduce_memory_usage=True, ignore_value2text_conversions=True)
    df = df[[c for c in df.columns if c in columns_to_keep]]
    # Cumulate timestamps of previous files
    if len(extract.index) == 0:
        df['timestamp'] = df.index
    else:
        df['timestamp'] = df.index + extract.loc[extract.index[len(extract.index) - 1], 'timestamp']
    df = df.reset_index(drop = True)
    df['source_file'] = mf4_local_files[mf4].split('/')[-1]
    # Concatenate dataframes
    extract = extract.append(df, ignore_index = True)

But still, would there be another way to do this that's more pythonic and more native to asammdf?

danielhrisca / asammdf

extract_bus_logging and concatenate, resulting in exception "different channel groups count" #670

Python version

Code

MDF version

Code snippet

Traceback

Description