brainhack-school2020 / MarkNelson86_EEGRecogBIDS

Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

unpacking nested matlab structures (.mat files) in python #2

Closed MarkNelson86 closed 4 years ago

MarkNelson86 commented 4 years ago

@PeerHerholz @rmarkello So it's a 3 level nested structure. 1000 subjects, 49 trials, 5 electrodes. I'm just trying to unpack it into a pandas dataframe. Advice?

S = 1x1000 struct array with fields: ID Novels BFail

S(1).Novels ans = 1x49 struct array with fields: Type Trln Time_since_tar Trls_since_tar Time_since_odd Trls_since_odd Elecs

S(1).Novels(1).Elecs ans = 1x5 struct array with fields: enam enum data

MarkNelson86 commented 4 years ago

This worked for extracting the ID field from the main structure:

ID_Vec = [np.array2string(x) for x in S['S'][0][:]['ID']] # IDs in list as strings!

rmarkello commented 4 years ago

Hey, @MarkNelson86 ! Did @cat-boucher end up sharing her code?

If not, a few functions that might help:

def coerce_void(value):
    """
    Converts `value` to `value.dtype`

    Parameters
    ----------
    value : array_like

    Returns
    -------
    value : dtype
        `Value` coerced to `dtype`
    """

    if np.squeeze(value).ndim == 0:
        return value.dtype.type(value.squeeze())
    else:
        return np.squeeze(value)

def get_labels(fields):
    """
    Helper function to get .mat struct keys from `fields`

    Parameters
    ----------
    fields : dict_like

    Returns
    -------
    labels : list
        Struct keys
    """
    labels = [k for k, v in sorted(fields.items(),
                                   key=lambda x: x[-1][-1])]
    return labels

For example, you could use some of these on the loaded S object:

from scipy.io import loadmat
data = loadmat(fname)['S']

# convert data structure to a normal dictionary using dtypes as keys
labels = get_labels(data.dtype.fields)
data = {labels[n]: value for n, value in enumerate(data)}

Now data will be a dictionary with keys ['ID', 'Novels', BFail']!

If you know one of the values is array-like (like, 'ID'), you should be able to use the following syntax to coerce it to an array:

 data['ID'] = coerce_void(data['ID'])

Hope that helps! Let me know if you have any questions about this. it's tough to give really specific help without being able to play around with a data file, but this will hopefully get you on track to a nested dictionary + array that can be readily converted into a dataframe using e.g., pandas.DataFrame.from_dict().

MarkNelson86 commented 4 years ago

@rmarkello This is great! Thanks.

@cat-boucher did share her code which was also really helpful.

cat-boucher commented 4 years ago

Im also going to repackage that code this weekend for general use (any nested matlab structure to multiindex pandas df). Its going to be a fun open source project!