run_in_process() is unsound (maybe)

dabeaz commented 7 years ago

This is not so much a bug, but just random thinking about the run_in_process() function. At the moment, you can use this to offload a CPU-intensive operation to a subprocess. However, there's a part of me that feels that the whole implementation is rather flawed.

It's not so much a bug in Curio, but it pertains to the greater problem of having control over how it works. Submitting a job to a subprocess involves making a process fork on Unix. Given the kind of state sitting behind the Curio kernel (thread pools, kernel, signal handling, locking, etc.) there are all sorts of tricky questions that arise when this migrates over to the child process. It's well known that combining fork() with threads is a good way to make your head completely explode. And then there's the whole interaction of the submitted work with the rest of the application itself. Basically, it only works if the CPU-intensive work is totally isolated and side-effect free.

All things equal, I'm sort of wondering if Curio ought to promote a more disciplined distributed computing approach for working with CPU-intensive work. For example, launching truly independent interpreters in a more controlled way. Maybe relying more on explicit message passing between interpreters.

There's been a tiny bit of work related to this in the curio/channel.py file. For example, you can set up connections between processes with that. It's not really fully fleshed out at this time though.

Anyways, just a random thought. It's something I want to spend more time thinking about.

Zaharid commented 7 years ago

I am trying to write a scientific "framework" that uses curio and run_in_process. The idea is that the users requests some results (say some plots), and the necessary steps (e.g. downloading some required inputs, calculating something using some C extensions, actually producing the plot and writing it to disk) are encoded as a graph, and each step and executed processes as soon as the dependencies are resolved and there are workers available.

However using curio with run_in_process currently results in a marginal improvement over a sequential single process implementation. A list of several problems I can imagine. I realize many of these possibly are not in the scope of curio, but still maybe I get some tips :).

There is no simple way to profile the result. I don't know of any profiler for python that plays well with the multiprocessing module. I haven't been able to understand where the extra time is spent exactly in the multiprocessing implementation (not that I have tried all that hard...) and therefore the following items are mostly speculation. Is it serialization? Is it concurrent disk writes of the figures? Is it bad memory access patterns? Is it my CPU overclocking the single processor case? Could curio tell me if the time is spent on executing my function or on communication between the processes?

Could pickle serialization be avoided by somehow using shared memory with python data structures directly? Alternatively, and maybe more realistically, could intermediate results be kept in the worker process and only returned on demand? Something along the lines of:

def fib(n):
    if n <= 2:
        return 1
    else:
        return fib(n-1) + fib(n-2)

def fiblist(n):
   return [fib(i) for i in range(n)]*fib(n)

def plot_fiblist(fiblist):
    import matplotlib.pyplot as plt
    plt.plot(fiblist)
    plt.savefig("fig.pdf")

#some coroutine
worker = await curio.create_worker()
#Executes and stores the result somewhere remotely
remote_fiblist = await worker.execute(fiblist, 50)
#Same worker uses the stored argument without serializing anywhere
await worker.execute(plot_fiblist, remote_fiblist)
#We can as well get it somehow
fiblist = await remote_fiblist.get()

I don't know how efficient forking is in practice. OK, it's copy on write, but what does that mean in a Python application?

njsmith commented 7 years ago

@dabeaz: are you aware of the multiprocessing module's set_start_method("spawn") option? It enables the Windows-style spawn each child from scratch strategy everywhere instead of using forking. (Another bonus: works on Windows.) The downside is that you become more limited in what kinds of objects you can pass to workers.

Dask and joblib are some existing projects in this area btw.

imrn commented 7 years ago

If it will have the transparency of run_in_process() or something like

proc_task = await spawn_in_proc(func())

it would be great. Probably transfering proper state to the other interpreted would be a challenge. Probably forking provides all those shortcuts.

And if you would be going down the road of seperate python interpreter processes, sockets and pickling, perhaps you may also consider running them on different machines.

Implementation of https://github.com/dask/distributed seems too complicated. "Is it rightfully so?" would a proper question. And one needs to lay dask code all over his code. :/ http://distributed.readthedocs.io/en/latest/quickstart.html

If an ultra compact and completely transparent (no special code visible) curio version is possible it would be uber something...

dabeaz commented 7 years ago

My gut feeling is that I'd probably want Curio to steer clear of trying to become a framework for "high performance computing." That's not to say that it couldn't be used as a layer for building something like that, but that should probably be a separate project whatever it is.

Instead, I'm thinking more generally about the problem of launching a subprocess, communicating with it, and dealing with tricky issues such as cancellation. Although run_in_process() has some of that, I'm not sure it's the final solution. Maybe it's just a first step towards something better.

imrn commented 7 years ago

Sure. It has to be minimal. I'd too be happy to see the most compact and 'transparent' solution.

dabeaz commented 6 years ago

Curio was recently modified to use the "spawn" method of multiprocessing. In big picture, I think external processes should be "clean" interpreters, not fork() clones of the Curio main process.

dabeaz / curio

run_in_process() is unsound (maybe) #165