goerz / gc3pie

Automatically exported from code.google.com/p/gc3pie
0 stars 0 forks source link

A way to execute a python method (from a given class) instead of an executable is needed #323

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
There apparently is no effective way to describe the processing for a Task 
directly in Python. 

Yet Python is often use as a kind of universal high level glue while describing 
complex scientific data processing algorithms.

This would be needed e.g. on standard Debian-based installations.

It might be implemented by sending the serialized object on which to call a 
.start() method along with the needed arguments.

Original issue reported on code.google.com by marco.qu...@gmail.com on 24 Sep 2012 at 12:59

GoogleCodeExporter commented 9 years ago
I'm not sure I understand the request.  How would this be different from:

1. Saving the Python code to be executed into a file `mystuff.py`

2. Using a variation of the following `Application` subclass:

        class MyStuffApp(Application):
          def __init__(self, arguments, *extra_args):
            Application.__init__(
              self,
              executable = './mystuff.py',
              arguments = arguments,
              inputs = [ '/path/to/mystuff.py' ], # + other input files
              outputs = [],
              output_dir = '/tmp',
              **extra_args)

Original comment by riccardo.murri@gmail.com on 24 Sep 2012 at 1:15

GoogleCodeExporter commented 9 years ago
....or do you mean that you want a more convenient way / syntactic sugar for 
doing exactly that?

Original comment by riccardo.murri@gmail.com on 24 Sep 2012 at 1:18

GoogleCodeExporter commented 9 years ago
Just syntactic sugar plus module __file__ transfer and pickle.load()ing for 
terminated() perhaps?

I see the need for keeping gc3pie on the "issuing laptop-like machine" only. 
Therefore, minimal processor objects in python amount to full modules with a 
__main__.

My point is: as far as I can see gc3pie helps one describe processing chain 
structures in a clean object oriented fashion. It would be great if -at least 
simple- algorithms could be included in the description together with the 
structure.

Would that be something like:
        class MyStuffApp(Application):
          def __init__(self, arguments, *extra_args):
            Application.__init__(
              self,
              executable = '/usr/bin/python',
              arguments = arguments,
              inputs = [ '.py' ], # + other input files
              outputs = [],
              output_dir = '/tmp',
              **extra_args)

Will be trying right away, thanks a lot
Marco

Original comment by marco.qu...@gmail.com on 24 Sep 2012 at 2:05

GoogleCodeExporter commented 9 years ago
it works - smelly code red alarm though! :-)
m

Original comment by marco.qu...@gmail.com on 27 Sep 2012 at 3:29

Attachments:

GoogleCodeExporter commented 9 years ago
new version of pyrun_example, now unpickles all pickle.load()able 
application.outputs to the application.results - leaves all other outputs 
untouched

Original comment by marco.qu...@gmail.com on 1 Oct 2012 at 9:39

Attachments:

GoogleCodeExporter commented 9 years ago
Variations on the same theme: use Celery[0] or Gearman[1] to dispatch
pure-Python tasks to worker nodes.  Both Celery and Gearman provide
modules that take care of the serialization and deserialization of
arguments of Python calls, and handle asynchronous remote execution.
We could write a Celery or Gearman backend to execute GC3Pie Tasks
through these task queue systems.

Possible issues:

* Both Celery and Gearman need their own dedicated daemon running on
  the worker nodes.  Again this would not be a problem in your case
  as you would run on a dedicated infrastructure over which you have
  complete control?

* As far as I can understand, the source file containing functions
  to be executed must already be present on the worker machine, but
  I gather this would not be a problem in your case?

  On the other hand, if we assume that all the relevant source files
  are already deployed remotely, then all we need to do is just
  pickle/unpickle the arguments and stage them pickled files, which
  your code already does :-)

[0]: http://celeryproject.org/
[1]: http://www.gearman.org/  but see also: 
http://www.saltycrane.com/blog/2010/04/notes-using-gearman-with-python/

Original comment by riccardo.murri@gmail.com on 3 Oct 2012 at 4:43

GoogleCodeExporter commented 9 years ago
More notes about similar solutions "in the wild": Pyro[2] has been for
a long time the standard solution for doing transparent remote calls
in Python.  It again handles automatic pickling/unpickling of objects,
which might be less trivial than it seems because unpickling arguments
may involve access to source files that are not available on the
remote machine? (I need to investigate this further!)

However, Pyro seems not to allow multiple "servers" (a server is the
remote machine that executes code) to expose the same objects, and the
documentation never mentions load-balancing.  Hence, it seems more
a distributed object broker (e.g., CORBA, COM) than a work queue
system.

Original comment by riccardo.murri@gmail.com on 3 Oct 2012 at 4:48

GoogleCodeExporter commented 9 years ago
You are right, it seems that in Pyro you can't just pass any object to the 
server, it needs to have the class definition, because of the way the 
unpickling works.

In principle you could get the filename with the `inspect` module and add those 
files to the `inputs` array, but it's a quite fragile and error-prone 
solution...

Original comment by arcimbo...@gmail.com on 6 Mar 2013 at 4:18