caporaso-lab / sourcetracker2

SourceTracker2
BSD 3-Clause "New" or "Revised" License
61 stars 45 forks source link

Possibly look into the Python builtin `multiprocessing` for parallelization? #92

Closed jakereps closed 5 years ago

jakereps commented 6 years ago

Ran into a dependency issue on a plane recently, where I didn't have ipyparallel installed (development environment), so couldn't utilize parallelization. Barring any hard dependencies with ipyparallel it seems it'd make sense to swap for native python parallelization through the multiprocessing package.

I could kick up a benchmark to see if there are any gains or losses between the two.

johnchase commented 6 years ago

@jakereps Have you looked into this? No worries if not, but I might take a look at it

jakereps commented 6 years ago

I was just starting to toy with it yesterday. Other than running Client-less gibbs itself in multiprocessing.Pool I haven't played with it.

wdwvt1 commented 6 years ago

I think it would be very cool to eliminate this - have seen examples where ipyparallel isn't working great (not killing the client etc.)

nick-youngblut commented 5 years ago

ipyparallel is also a pain when running multiple instances of sourcetracker2 in parallel (eg., for testing the effects of differing parameters in a systematic way). Each separate run of sourcetracker2 runs subprocess.Popen('ipcluster start -n %s --quiet' % jobs, shell=True), so the controller is started more than once. These causes sourcetracker2 to error and die. Instead, I have to use only 1 job per sourcetracker2 run, which results in runs taking >80 hours in some instances.