danielhrisca / asammdf

Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files
GNU Lesser General Public License v3.0
633 stars 224 forks source link

to_dataframe export fails due to duplicate axis/signal #394

Closed jmtatsch closed 3 years ago

jmtatsch commented 4 years ago

Python version

('python=3.7.5 (tags/v3.7.5:5c02a39a0b, Oct 15 2019, 00:11:34) [MSC v.1916 64 ' 'bit (AMD64)]') 'os=Windows-10-10.0.17763-SP0' 'asammdf=5.20.6' 'numpy=1.19.0'

Code

MDF version

3.20

Code snippet

df = mdf_obj.to_dataframe(use_interpolation=False)

Traceback

Traceback (most recent call last): File "C:/Users/jmtatsch/Workspace/damage_prediction/prepare_data.py", line 12, in analyze_measurement(measurement_path) File "C:\Users\jmtatsch\Workspace\damage_prediction\data_analysis.py", line 53, in analyze_measurement df = mdf_obj.to_dataframe(use_interpolation=False) File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\asammdf\mdf.py", line 3352, in to_dataframe df[channel_name] = pd.Series(sig.samples, index=sig.timestamps) File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py", line 2938, in setitem self._set_item(key, value) File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py", line 3000, in _set_item value = self._sanitize_column(key, value) File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py", line 3613, in _sanitize_column value = reindexer(value) File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py", line 3604, in reindexer raise err File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py", line 3599, in reindexer value = value.reindex(self.index)._values File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\pandas\core\series.py", line 4030, in reindex return super().reindex(index=index, **kwargs) File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py", line 4544, in reindex axes, level, limit, tolerance, method, fill_value, copy File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py", line 4567, in _reindex_axes allow_dups=False, File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py", line 4613, in _reindex_with_indexers copy=copy, File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\managers.py", line 1251, in reindex_indexer self.axes[axis]._can_reindex(indexer) File "C:\Users\jmtatsch\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexes\base.py", line 3099, in _can_reindex raise ValueError("cannot reindex from a duplicate axis") ValueError: cannot reindex from a duplicate axis

Description

The problematic file was recorded by canape. I checked for duplicate signal names in each channel group, but there were none. For the data groups there were multiple signals 't', but that is allowed, right? Any ideas where this error might come from?

danielhrisca commented 4 years ago

See https://stackoverflow.com/questions/27236275/what-does-valueerror-cannot-reindex-from-a-duplicate-axis-mean

is there a problem with the timestamps?

jmtatsch commented 4 years ago

Seems indeed one signal has a duplicate float time stamp. Can the to_dataframe method be made more robust to this?

danielhrisca commented 4 years ago

@jmtatsch please try the development branch

danielhrisca commented 4 years ago

ping @jmtatsch

jmtatsch commented 4 years ago

Sorry for taking so long, I really appreciate your excellent work on this library.

Today I finally managed to check out the development branch and tested again:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    df = mdf.to_dataframe(use_interpolation=False)
  File "/home/tatsch/.local/lib/python3.8/site-packages/asammdf-5.22.1.dev21-py3.8-linux-x86_64.egg/asammdf/mdf.py", line 3494, in to_dataframe
    df = pd.DataFrame.from_dict(df)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/frame.py", line 1309, in from_dict
    return cls(data, index=index, columns=columns, dtype=dtype)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/frame.py", line 468, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/construction.py", line 283, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/construction.py", line 78, in arrays_to_mgr
    index = extract_index(arrays)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/construction.py", line 397, in extract_index
    raise ValueError("arrays must all be same length")
ValueError: arrays must all be same length
danielhrisca commented 4 years ago

@jmtatsch please check the new code in the development branch

jmtatsch commented 4 years ago
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    df = mdf.to_dataframe(use_interpolation=False)
  File "/home/tatsch/.local/lib/python3.8/site-packages/asammdf-5.23.0.dev26-py3.8-linux-x86_64.egg/asammdf/mdf.py", line 3495, in to_dataframe
    df[name] = pd.Series(
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/series.py", line 221, in __init__
    data = SingleBlockManager.from_array(data, index)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/managers.py", line 1569, in from_array
    block = make_block(array, placement=slice(0, len(index)), ndim=1)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/blocks.py", line 2719, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/blocks.py", line 2375, in __init__
    super().__init__(values, ndim=ndim, placement=placement)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/blocks.py", line 124, in __init__
    self.ndim = self._check_ndim(values, ndim)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/blocks.py", line 159, in _check_ndim
    raise ValueError(
ValueError: Wrong number of dimensions. values.ndim != ndim [3 != 1]
danielhrisca commented 4 years ago

@jmtatsch again, please check the new code in the development branch

jmtatsch commented 4 years ago
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    df = mdf.to_dataframe(use_interpolation=False)
  File "/home/tatsch/.local/lib/python3.8/site-packages/asammdf-5.23.0.dev28-py3.8-linux-x86_64.egg/asammdf/mdf.py", line 3513, in to_dataframe
    df = pd.DataFrame.from_dict(df)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/frame.py", line 1309, in from_dict
    return cls(data, index=index, columns=columns, dtype=dtype)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/frame.py", line 468, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/construction.py", line 283, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/construction.py", line 83, in arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/construction.py", line 340, in _homogenize
    val = val.reindex(index, copy=False)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/series.py", line 4399, in reindex
    return super().reindex(index=index, **kwargs)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/generic.py", line 4458, in reindex
    return self._reindex_axes(
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/generic.py", line 4478, in _reindex_axes
    obj = obj._reindex_with_indexers(
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/generic.py", line 4521, in _reindex_with_indexers
    new_data = new_data.reindex_indexer(
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/internals/managers.py", line 1276, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
  File "/home/tatsch/.local/lib/python3.8/site-packages/pandas-1.1.2-py3.8-linux-x86_64.egg/pandas/core/indexes/base.py", line 3285, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
danielhrisca commented 4 years ago

@jmtatsch hopefully this is the final fix, please check again

jmtatsch commented 3 years ago

Yes, no more errors now. Thank you very much for developing and supporting this marvelous piece of open software.