Multiprocessing with datasets larger than 2GB is more difficult because they are too large to be safely pickled/unpickled. Since we have already determined that we cannot get around needing to duplicate the data for each process, we will need to use shared memory (sadly!)
The SelectionStrategy base object is a good place to handle this, probably. We can construct the datasets on the fly and then put them into shared memory and pass on a reference to the data. There is no need to try and handle Windows/Linux differently because all methods are "greedy".
Multiprocessing with datasets larger than 2GB is more difficult because they are too large to be safely pickled/unpickled. Since we have already determined that we cannot get around needing to duplicate the data for each process, we will need to use shared memory (sadly!)
The
SelectionStrategy
base object is a good place to handle this, probably. We can construct the datasets on the fly and then put them into shared memory and pass on a reference to the data. There is no need to try and handle Windows/Linux differently because all methods are "greedy".