CIMH-Clinical-Psychology / EMO_REACT

0 stars 1 forks source link

Extract windows efficiently #5

Open skjerns opened 1 year ago

skjerns commented 1 year ago

I looked into my code base and it seems like that the sklearn library has changed, so the function _extract_patches is now hidden. I also wrote the function from scratch back then, not using sklearn, as I needed more flexibility.

Nevertheless, this is how you could use the extract_patches function to extract windows:

def extract_windows(arr, wlen, step, axis=-1):
    """
        Parameters
    ----------
    arr : np.ndarray
        input array of arbitrary dimensionality
    wlen : int
        window length in sample points
    step : int
        steps in sample points between window starts
    axis : in, optional
        Along which axis to extract, e.g -1 if the time dimension is the last
        dimension.The default is -1.

    Returns
    -------
    windows : np.ndarray
        extracted windows. first dimension is the number of windows
    """
    patch_shape = list(arr.shape)
    patch_shape[axis] = wlen
    windows = _extract_patches(arr, patch_shape, extraction_step=step)
    windows = windows.squeeze()
    # arrays are views, so no changing of values allowed for safety
    windows.setflags(write=False) 
    return windows

I have tested it and it seems to work. However, no garantuee, I just wrote it quickly. The arrays are created as views, that means when you change the value in window 0, and window 1 has an overlapping part of the same value (e.g. timepoint 2 is part of window 0 and window 1), the value will also be changed in window 1. Therefore the array is set as write=False. Might be that this makes problems, if this is the case, you need to resort to use another function or copying the arrays once.

skjerns commented 1 year ago

btw this is what ChatGPT came up with:

def extract_time_windows(eeg_data, window_size, step_size):
    num_trials, num_electrodes, num_samples = eeg_data.shape

    # Calculate the number of windows that will fit in the EEG data
    num_windows = int(np.floor((num_samples - window_size) / step_size) + 1)

    # Create an empty array to store the extracted windows
    extracted_windows = np.zeros((num_trials, num_electrodes, num_windows, window_size))

    # Loop through each trial and electrode
    for trial in range(num_trials):
        for electrode in range(num_electrodes):
            # Extract windows for the current trial and electrode
            for i in range(num_windows):
                start_idx = i * step_size
                end_idx = start_idx + window_size
                extracted_windows[trial, electrode, i, :] = eeg_data[trial, electrode, start_idx:end_idx]

    return extracted_windows.swapaxes(0, 2).swapaxes(1,2)

when running the two functions on the same input, they return the same result. Maybe use the one from ChatGPT, as long as there are no performance issues. But you will already see that function two takes significantly longer.

array = np.random.rand(96, 64, 301)
wlen = 10
step = 1
w1 = extract_windows(array, wlen, step)
w2 = extract_time_windows(array, wlen, step)

np.testing.assert_array_equal(w1, w2)
esmondo commented 1 year ago

The function with extract_patches is the fastest. The output window extraction (not the decoding):

Elapsed time of X_timewindows1: 0.06 seconds (my func) Elapsed time of X_timewindows2: 0.00 seconds (extract_patches) Elapsed time of X_timewindows3: 1.20 seconds (chatGPT)

all the functions return 4D array (n_windows, n_epochs, n_channels, n_samples).