gelijergensen / PermutationImportance

Python package for computing the importance of variables in a model through permutation selection
MIT License
5 stars 4 forks source link

Datasets larger than 2GB cannot be pickled #74

Open gelijergensen opened 5 years ago

gelijergensen commented 5 years ago

Multiprocessing with datasets larger than 2GB is more difficult because they are too large to be safely pickled/unpickled. Since we have already determined that we cannot get around needing to duplicate the data for each process, we will need to use shared memory (sadly!)

The SelectionStrategy base object is a good place to handle this, probably. We can construct the datasets on the fly and then put them into shared memory and pass on a reference to the data. There is no need to try and handle Windows/Linux differently because all methods are "greedy".