goerz / gc3pie

Automatically exported from code.google.com/p/gc3pie
0 stars 0 forks source link

Implement Python's "futures" interface? #200

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Should we implement the "future" interface as described in PEP 3148?
(and implemented in the package `futures` and `concurrent.futures`)

References:
  * http://www.python.org/dev/peps/pep-3148/
  * http://pypi.python.org/pypi/futures

1) A quick proposal is that each `Task` object implements the "future"
interface, so exposes the `result()`/`cancel()`/`running()`/etc. attributes.  

2) We could also have `GridExecutor` objects based on the `Core` and/or
`Engine` classes, that are capable of running Python futures on the
Grid thorugh GC3Pie.

Question: if we provide an "executor" interface, then we should
support running of arbitrary Python code on remote hosts through
GC3Pie.  This might turn out to be really hard.

Original issue reported on code.google.com by riccardo.murri@gmail.com on 29 Jun 2011 at 4:08

GoogleCodeExporter commented 9 years ago

Original comment by riccardo.murri@gmail.com on 17 Aug 2012 at 11:46

GoogleCodeExporter commented 9 years ago

Original comment by riccardo.murri@gmail.com on 18 Jan 2013 at 9:48

GoogleCodeExporter commented 9 years ago
What is the reason for this? It will make the code easier to read, simpler, 
faster, more robust? Could this solve some issue with the current code?

Original comment by antonio....@gmail.com on 19 Jan 2013 at 3:39

GoogleCodeExporter commented 9 years ago
| What is the reason for this? It will make the code easier to read, simpler,
| faster, more robust? Could this solve some issue with the current code?

It would make GC3Pie implement a standard Python "interface" for async code.

Original comment by riccardo.murri@gmail.com on 19 Jan 2013 at 4:52

GoogleCodeExporter commented 9 years ago
I see, but my point is: how GC3Pie code could benefit from implementing an 
async interface?

async code should help to parallelize IO blocking operations or multiple 
independent processes to run. In our code, this happens mainly:

* checking task statuses,
* uploading/downloading files during submission/retrieval of results.

You can improve the speed of these steps by creating multiple processes, but in 
this case you will increase the memory used by GC3Pie, and we already hit a 
memory problem when we are running many applications in parallel, so I don't 
know if we can actually benefit from this.

Another problem I see is that the Task interface is currently quite different 
from the one proposed in the futures pep. 

For example the `cancel()` method will "Attempt to cancel the call. If the call 
is currently being executed then it cannot be cancelled and the method will 
return False, otherwise the call will be cancelled and the method will return 
True." so it seems that the interface assumes that the execution encapsulated 
into the Future cannot be interrupted, while in GC3Pie we assumes that a job 
can always be cancelled.

But maybe the harder problem to solve in this case is the following: how 
Futures will work with the persistence? If the CoreExecutor submits some 
FutureTasks, then the script is killed and restarted, how can we re-attach the 
FutureTasks to the CoreExecutor? Is there any way to do it?

Original comment by antonio....@gmail.com on 20 Jan 2013 at 10:21

GoogleCodeExporter commented 9 years ago
You seem to question the need for the "futures" interface, whereas I
do not see how having *another* interface to the GC3Pie functionality
could possibly hurt us...  Maybe we just interpret this assignment in
a different way?

For example:

| my point is: how GC3Pie code could benefit from implementing an
| async interface?

It's not GC3Pie code that benefits from implementing the "futures"
interface.  It's *other people's code* that has another entry point to
GC3Pie, and one that is (going to be) standard in Python, hence
requires less adaptation of mind and code framework.

Think of it this way: currently we only have one "interface" to
GC3Pie, which is patterned around the batch job submission mechanism.
Fine, and familiar to those people who come from a batch processing
background.  Now it's time to expand into another paradigm, namely
allowing for easy use of GC3Pie as a "task manager" in Python
applications.  I think that will be eased by having an API paradigm
that blends better with the "function call" style that's normally used
in Python programming.  The "futures" interface does exactly that: you
call a function, GC3Pie executes the associated Task/Application in
the background, and you get the actual result when it's done.

In other words, I only see the "futures" interface as another way of
writing the main loop; instead of:

  task = Task(...)
  task.attach(core)
  task.submit()
  while task.state != TERMINATING:
    # ...
    task.update_state()

we would have:

   task = Task(...)
   future = executor.submit(task)
   while not future.done():
     # ...

So, on to the specific questions:

Where's the problem?  If we can always cancel a running task, do it
and return `True`.

Again: I do not see a problem; the `CoreExecutor` could reference an
existing store, like the `SessionBasedScript` and `Engine` objects
do.  Upon creation, the `CoreExecutor` reloads all saved state from
the store.

Original comment by riccardo.murri@gmail.com on 20 Jan 2013 at 2:37

GoogleCodeExporter commented 9 years ago
This PyPI package implements something along the lines of what is proposed 
here: https://crate.io/packages/clusterfutures/

Original comment by riccardo.murri@gmail.com on 10 Jun 2013 at 9:45