Open LaurentBerder opened 2 years ago
I would rather save each decoded file to a temporary folder and feed the file names to the concatenate method
Thanks for coming back to me.
You mean like the following?
for mf4 inrange(len(mf4_local_files):
# read the file
mdf = MDF(mf4_local_files[mf4])
# Convert with DBC
scaled = mdf.extract_bus_logging(dbc)
# Save temporary scaled file
scaled.save(os.path.join(os.getcwd() + "/tmp/scaled/" + str(mf4) + ".mf4"))
# Concatenate all decoded files
mdf_scaled = MDF.concatenate(sorted([f for f in
glob(os.getcwd() + "/tmp/scaled/*")
if (".mf4" in f)])
I still get the same exception
MdfException: internal structure of file <.../tmp/scaled/1.mf4> is different; different channel groups count
I tried going through dataframes
extract = pd.DataFrame()
for mf4 in range(len(mf4_local_files)):
# Read file
mdf = MDF(sorted(mf4_local_files)[mf4])
# Convert with DBC
scaled = mdf.extract_bus_logging(dbc)
# Extract to dataframe, only keep required columns and add timestamp column
df = scaled.to_dataframe(reduce_memory_usage=True, ignore_value2text_conversions=True)
df = df[[c for c in df.columns if c in columns_to_keep]]
df['timestamp'] = df.index
df = df.reset_index(drop = True)
df['source_file'] = mf4_local_files[mf4].split('/')[-1]
# Concatenate dataframes
extract = extract.append(df, ignore_index = True)
But it doesn't feel native, its pretty slow, and I notice the timestamps are not continuous (they start over from zero at each file?).
I corrected the issue with the non-continuous timestamps by adding cumulating them file after file
for mf4 in range(len(mf4_local_files)):
# Read raw file
mdf = MDF(sorted(mf4_local_files)[mf4])
# Convert it with DBC
scaled = mdf.extract_bus_logging(dbc)
# Extract to dataframe, only keep required columns and add timestamp column
df = scaled.to_dataframe(reduce_memory_usage=True, ignore_value2text_conversions=True)
df = df[[c for c in df.columns if c in columns_to_keep]]
# Cumulate timestamps of previous files
if len(extract.index) == 0:
df['timestamp'] = df.index
else:
df['timestamp'] = df.index + extract.loc[extract.index[len(extract.index) - 1], 'timestamp']
df = df.reset_index(drop = True)
df['source_file'] = mf4_local_files[mf4].split('/')[-1]
# Concatenate dataframes
extract = extract.append(df, ignore_index = True)
But still, would there be another way to do this that's more pythonic and more native to asammdf?
Python version
Code
MDF version
4.11
Code snippet
Traceback
MdfException: internal structure of file <__new__.mf4> is different; different channel groups count
Description
The error happens between the first and second MF4 files. Indeed, when I look at the details of the decoded files, the number of channels is not the same:
len(decoded_0.channels_db.keys())
: 215,len(decoded_1.channels_db.keys())
: 211I used to not have any problems reading these files, because I didn't follow the steps in the same order. What I used to do was the following:
That would return a 215 channels, with potential missing data in them for files that did not originally contain them.
I have changed the order of steps, because this process would crash when reading large MF4 files (several 2GB files), and I figured filtering on the channels I use only would solve the memory issue (which it did, but introduced this new problem).
How could I find a way to avoid the Exception?