JelleAalbers / npshmex

ProcessPoolExecutor that transfers numpy arrays without pickle overhead
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

support python 3.8 shared memory? #1

Open dizcza opened 4 years ago

dizcza commented 4 years ago

It's not an issue, but a question. Python 3.8 introduces the SharedMemory object that is similar to what you're using, https://gitlab.com/tenzing/shared-array. But since it's native, I prefer using built-in functional rather than a third party.

Do you plan to support SharedMemory? It'd be very handy 'cause python developers didn't bother to write examples of how to use their new feature with a process pool executor. Instead, they just posted a dirty "low-level" multiprocessing routine manipulation which is always clumsy and not as attractive as executors.

JelleAalbers commented 4 years ago

Hi @dizcza, thanks for pointing this out! Great to hear python 3.8 has shared memory support built-in, and completely agree on using the standard library when possible.

I'm currently using npshmex in a medium-size project (https://github.com/AxFoundation/strax) that is stuck at pyton 3.6 from the moment. Python 3.8 is the future, but it will be a hard requirement for many projects (even tensorflow does not support it yet).

One option would be to have SharedMemory as an alternate backend for npshmex, if it is available. I'd be happy to look into this once our project moves to 3.8, or accept a pull request for it if you or someone else wants to have a look at it sooner.

maxnoe commented 3 years ago

I think with python 3.8, the standard process pool would work as well as this project, due to the new, zero-copy pickle 5 protocol:

https://www.python.org/dev/peps/pep-0574/

https://github.com/numpy/numpy/issues/11161

JelleAalbers commented 3 years ago

Hi @maxnoe, thanks for the heads-up! I timed the readme's example on my laptop: (wall times averaged over 100 loops over the last two lines):

Python numpy concurrent.futures npshmex
3.6.12 1.19.2 0.83 sec 0.21 sec
3.8.5 1.19.2 0.71 sec 0.20 sec
3.9.0 1.19.4 0.71 sec 0.20 sec

So concurrent.futures.ProcessPoolExecutor got slightly faster, but npshmex is considerably faster still. Maybe I'm missing some setting or flag to ensure numpy arrays use the new pickle protocol?

maxnoe commented 3 years ago

To avoid all copies, you have to pass buffers along to pickle.dump and pickle.loads, which process pool proably does not do by itself.

But maybe you can make it work with a custom pool and SharedMemory? That would remove the third party depedency.

Skylion007 commented 3 years ago

@maxnoe You could also just build a wrapper around the connections similar to what PyTorch does to default to pickle5. You could also override other functions as necessary using a similar proxy class. We changed to Pickle5 as default by using a similar hack here (including backport): https://github.com/facebookresearch/habitat-lab/pull/582