KamitaniLab / bdpy

Python package for brain decoding analysis (BrainDecoderToolbox2 data format, machine learning analysis, functional MRI)
MIT License
33 stars 22 forks source link

Idea for improving the speed of Features().get() #73

Closed ganow closed 9 months ago

ganow commented 10 months ago


Currently, if the number of features we want to load is large, the following code would take a long time:

feature_name = ...
stimulus_name = ...
features_store = Features("/path/to/features")
features = features_store.get(feature_name, label=stimulus_name)  # suppose len(stimulus_name) is large

This is because dataform.Features loads each .mat file sequentially.


We want to improve the speed of data loading for many stimuli.


We can use multiprocessing. For example,

from multiprocessing import Pool

def _load_data(path):
        return sio.loadmat(path)['feat']
    except NotImplementedError:
        return hdf5storage.loadmat(path)['feat']

class Features:
    def get(self, layer=None, label=None):
        path_iterator = map(lambda label: self.__feature_file_table[layer][label], labels)
        with Pool(processes=n_parallel) as pool:
            features = np.concatenate(pool.map(_load_data, path_iterator), axis=0)

Quick experiment

The original implementation took ~20 seconds to load 128 stimuli in my environment. The proposed implementation took ~3 seconds when n_parallel=16.