AcademySoftwareFoundation / opencue.io

Source for OpenCue website
https://www.opencue.io
Creative Commons Attribution 4.0 International
17 stars 26 forks source link

Migrate legacy tutorial #117

Open sharifsalah opened 5 years ago

sharifsalah commented 5 years ago

The source code for Cue3 included a tutorial which is now quite out of date. Some of this information could be updated and adapted for a tutorial or tutorials about PyOutline.

Text from the tutorial follows:

PyOutline Development

Introduction

PyOutline is a work flow engine that logically separates large processes or programs into small chunks which can be executed across many systems or CPUs in parallel, most commonly on a render farm. The main role played by PyOutline is as a glue library which binds an application like Katana or Maya to our render farm software, Cue3. It can also be used for ad-hoc command execution.

For example, lets say you had the command:

ginsu -filein blah.#.exr -fit 100x100 -fileout blah-small.#.exr -t 1-10

If you were to execute that command in a shell, Ginsu would load frame 1, resize it, and write out the result. Then it would load frame 2, resize it, and write out the result. This would happen all the way up to up to frame 10 in sequential fashion.

Outline allows you to create a template for that command and convert it into 10 individual Ginsu commands, one for each frame . Each of these ten commands could run on a different hosts in parallel . These templates are called "Modules", and they are they glue that binds an application or script to the cue.

Vocabulary

Frame - A frame is a discrete unit of execution on the cue. It represents a shell command or complex operation that will be executed in the same context or python interpreter. The name of a frame always consists of a number and its Layer name. For example: 0001-bty_env A frame can have one of many distinct states: Waiting - The frame is waiting to be assigned a processor from the render farm. Running - Frame is being executed on the render farm. Succeeded - All elements of the frame completed successfully. Dead - The frame failed in some way and was retried the max number of times. Layer - A layer is a numbered set of frames. Job - A job is a set of layers. Module - A python class which integrates an application or operation with the cue. It must inherit one of outlines base event classes, which are: Layer, Frame, PreProcess, PostProcess, PostFrame, or Composite. Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel"). Modules

An outline module is class that implements the logic necessary to integrate a particular application with the render farm software. Some examples of the major modules possible in Outline are:

Katana Maya Nuke There can be outline modules for every major application we run on the render farm and general purpose modules for running arbitrary shell commands.

Environment Variables

To use these, set them in your shell prior to launching a job.

OL_OS = Determines the operating system to use on the cue. OL_TAG_OVERRIDE = Changes all the tags for all the layers of any job you launch in that shell. RENDER_TO = Determines the facility to render to. Running Outline Files with pycuerun

Commandline Options

See pycuerun --help for more information.

Running an Outline File

To utilize your module, write a quick outline file:

    import MyModule
    MyModule("name", arg1="foo", arg2="bar")

Save it as a file and pycuerun it with a frame range:

pycuerun my_module_test.outline 1-5 [attention] If you want to test your outline interactively via console, you may use "--backend local" option. e.g.

pycuerun my_module_test.outline 1-5 --backend local Using PyOutline Procedurally

You can create outlines and launch them with no associated outline file. To procedurally launch an instance of your Outline module, you must first import outline.

    import versions
    versions.require("outline", "latest")
    from outline import Outline, cuerun
    import MyModule

Once you have done that, you can build your outline structure and launch it to the cue.

    ol = Outline("my_job") 
    ol.add_layer(MyModule ("name", arg1="foo", arg2="bar"))
    # The "wait" option causes cuerun.launch to block until the job is complete.
    cuerun.launch(ol, range="1-5", wait=True)
    # Optionally, you can run the job in test mode, which throws an exception if the job fails. 
    cuerun.launch(ol, range="1-5", test=True)

In these examples we'll show you how to use each of PyOutline 's Layer classes and launch them using the cuerun module.

Layer Frame LayerPreProcess LayerPostProcess OutlinePostCommand The Frame Class

The Frame class represents a single Frame . All subclasses of Frame will always result in a single frame onCue3 the cue regardless of the frame range the outline is launched with . Some subclasses of frame are PreProcess, PostProcess, PostCommand, and ShellCommand.

    from outline import Outline, cuerun
    from outline.modules.shell import ShellCommand

     ol = Outline("frame_test", "1-1000")
    # range, ignored, only frame 1 will hit the cue. 
    ol.add_layer(ShellCommand ("really_long_shell_command", command=["find","/*","-name","*.mp3"]) 
    cuerun.launch(ol)

In this case, the job that is created has only a single frame in the "really_long_shell_command" layer because ShellCommand is a subclass of frame. The frame range the job was launched with (1-1000) is basically ignored.

The Layer class

A Layer is a set of frames where the only basic property that changes between them is the frame number. Layers are most often used to generate a sequence of images from a scene file, or perform an operation on a sequence of images that results in a different set of images in parallel.

    from outline import Outline, cuerun
    from outline.modules.shell import Shell

     ol = Outline() ol.add_layer(Shell("ginsu_convert", ["ginsu","-filein","blah.#.exr","-fit","100x100","-fileout","blah-small.#.exr"]))

     cuerun.launch(ol, range="1-100")

In this example, a layer named ginsu_convert is created with 100 frames . Each frame represents a small part of the entire ginsu conversion, 1/100 to be exact. potential to be executed in parallel. In this case, if we put 100 processors on this job , we would have our result roughly 100x faster than if we ran the conversion sequentially.

Module Development

If you have been tasked to write an outline module, this is where to begin.

Efficiency

To ensure your cue pipeline uses the least amount of cue resources possible. Be sure to:

Hint the cue about the memory requirements for extremely low memory layers using the "memory" argument. Do not assign a full core to utility layers. Utility layers do things like send emails, clean up files, etc. Use the "cores" argument to change the default reserved cores. Do not have a 1:1 frame to layer ratio. If your creating a layer for every frame you are doing it wrong and you will have to fix it. Don't chain a bunch of frames together with dependencies that can run in a single frame. If you are doing that, you are doing it wrong. Test Environment

To develop and test a python module, all you need to do is have your test module located in your python path. This should be true for both the cue and your desktop.

A simple Outline script would be:

    from outline.modules.tutorial import HelloModule
    HelloModule ("test")

Anatomy of an Outline Module

An outline module is a python class which extends from Layer or one of Layer's subclasses. Layer is abstract in the sense that you must implement some methods in your subclass to provide application specific behavior. Since the outline will be serialized into a data format before launch, there are some best practices you should follow when designing a module.

    from outline import Layer

     def ExampleModule (Layer):

         """ An example outline module that implements all possible abtract functions. """

         def __init__(self, name, **args):
            Layer.__init__(self, name, **args)

         def _after_init(self):
            """
                Executed automatically after the constructor.
                This method exists because the parent outline is not known in the constructor.
            """
            outline = self.get_outline()

         def _after_parented(self):
            """
                Executed automatically after the layer has been parented to another layer.
                This only happens when building composite layers, or, layers that contain  
                other layers.
            """
            parent_layer = self.get_parent()

         def _setup(self):
            """
                Should contain any operations that should be run before the job is launched.
                This is the first time the session becomes available, so its possible to write data
                into the cue_archive.  
            """
            pass

         def _before_execute(self):
            """
                Run before execute. Generally used to create objects that do not serialize to
                pickle properly for job launch.
            """
            pass

         def _execute(self, frames):
            """
                The core module behavior should be implemented here. The frames argument
                contains an array of frames that the current instance is responsible for.
            """
            pass

         def _after_execute(self):
            """
                Run after execute even if execute throws an exception. Used for cleanup and
                implementing extra output checks like checking for black frames or log parsing.
            """
            pass

Constructor

The module constructor is executed upon object creation. In your constructor you should initialize all object specific properties. In general your constructor should be pretty small. Be careful not to set Cue3 or job specific data in your constructor.

At typical constructor would be:

    class Ginsu (Layer):
        def __init__(self, name, input, output, **args):
            Layer.__init__(self, name, **args):
            self.__input = input self.__output = output

Your constructor (and factory methods that call the constructor) should not have any side effects what so ever. For example, don't create a directory, touch a file, etc. Its possible instances of your module will be created without the intent to ever execute them.

_after_init(self, ol):

Run after the layer has been fully constructed assuming the layer has become part of an Outline. This method is used to continue module initialization since the overall outline module is not available until after the constructor is run.

_setup()

The _setup() method of every layer in your outline is executed once, and only once, before the outline is launched to the cue. Execution of _setup() is an indication the user wants to run the outline. Typically, you should include any pre-launch operations that need to happen in _setup().

The Session One feature you have access to while in _setup() is the session. The session is a place for you to store job specific data. For example, if I mkdir a path in _setup(), and then want my frames to have access to this path, I would do:

            self.put_data("outputpath", path_i_just_made)

Later on I could retrieve that data from a completely different process on a different machine.

            path = self.get_data("outputpath")

Sessions can also store files. Its a good idea to copy scene files or any other data that could change while the outline is executing. By default, the name of the file stays the same, but its also possible rename the file.

            self.put_file("/path/to/scene.file")

To retrieve the new path of the file, call get_file.

            scene_file_path = self.get_file("scene.file")

_before_execute()

Because your outline structure is pickled onto a network drive so all machines on the render farm can see it, your structure cannot contain un-pickleable objects. If it does, you can solve that problem using the _pre_execute hook.

Pre-execute is run right before _execute() on the render farm, and can be used to instantiate instances of modules that don't pickle. For example, since the io.Image class contains a reference to FileSequence, they cannot be instantiated until the frame is on the render farm.

    def _pre_execute(self):
        self.add_input(io.Image(self.get_arg("input_file")))

If you run into a problem launching an outline and see errors about yaml not being able to serialize some data, you should use _pre_execute() to create those objects.

_execute()

The execute() method for each layer is run on the render farm. Typically, execute() is run one time for every frame in the layer's frame range, but its possible it might be called less if the frame range is chunked.

Outline will pass in an array of frames that the current execute() call is responsible for. Execute() should contain the command(s) needed to run all frames. This means, you will most likely have to implement fuax chunking if your application does not support execution of arbitrary frame ranges.

From within execute(), you still have access to read and write the session. Be forewarned, if the session data is stored on NFS, simultaneous writes to the same session variable from multiple frames that could be running in parallel may cause session corruption. If this is a possibility for you, please reconsider your design. You could do the operation in _setup(), or create a LayerPreProcess that only gets run once or put the operation in its own Frame instance. Do not attempt to use NFS locking or some other synchronization scheme.

_after_execute()

Post execute runs after execute, even if the _execute() method fails. It will not run if the frame was forcibly killed. You can use _post_execute() to do any kind of post frame checks you want. For example, if the frame succeeded, maybe you want to check and see if the frame is not all black due to a license error. Or, if the frame failed, you might want to send an email to someone with a stack trace or details about the error.

_after_parented()

_parented() is called when a Layer instance is parented to another layer instance. When you parent a layer to another, the child layer no longer shows up as an explicit layer in the job. Instead, the child layer's _execute() method is called directly after that of the parent layer. This allows you to compose Layers made from other layers and do any additional setup if an instance of your module is parented to another Layer.

Building Your Own Module

PyOutline makes it simple to develop your own modules that encapsulate the integration logic for your application.

Best Practices

If you shell out within your outline module, use Layer.system(). This handles checking the return value.

If your need to build a pre-process module, start by subclassing LayerPreProcess. This will handle setting the frame range and dependencies.

HelloModule

In this example, we're going to build the simplest module possible and keeping with coder tradition, we're going to call it HelloModule. The purpose of HelloModule will be to print out "Hello Frame Number #" in the frame log.

    """ Outline Tutorial - HelloModule """
    import time
    from outline import Layer

     class HelloModule (Layer):
        def __init__(self, name, **args):
            Layer.__init__(self, name, **args)

         def _execute(self, frames):
            for frame in frames:
            print "Hello Frame Number #%d" % frame time.sleep(10)

Ok, lets break this down.

HelloModule(Layer)

All module classes must extend from Layer or one of Layer's sub classes. We'll get to those later.

Layer.init(self, name, **args)

The Layer class takes a variable list of arguments that can be retrieved via the the get_arg(name) method. You can pass your own values in via this list, or some of the standard Layer options which are:

chunk - The layer chunk size. Default is 1. range - Hard code a frame range for this layer. Default is None, which means it uses the frame range supplied by cuerun. tags - Specify the type of processors to use on the farm. Defaults is ["general","desktop"] threadable - Is true if the layer can be multi-threaded. Defaults to False. threads - Minimum number of threads to use for each frame, if the frame is threadable. Defaults to 1. require - Any array of other layers to depend on. Each element can be a reference to another Layer or its name. _execute(self, frames)

Execute is the work horse of your outline script. The code you put in execute is run on the render farm. In our simple example, we're just printing Hello Frame Number with the frame number, but, typically you would shell out to some other application here. The frames argument will contain the array of frames this _execute function is responsible for. Usually this is going to be a single frame, but if you apply chunking,this array may contain more than one frame number.

Launching

Now its time for more code. We're going to write some code that uses our HelloModule.

    #!/bin/env python2.5
    from outline import Outline, cuerun
    from outline.modules.tutorial import HelloModule

     ol = Outline("my_job") ol.add_layer(HelloModule ("my_layer"))

     cuerun.launch(ol, range="1-10", pause=True)

Save this out to a file and execute it. If the launch is a success, check the cue for your job.

Frame environment

Environment variables available during a render:

CUE_CHUNK - The chunk size of the layer CUE_FRAME - The frame name, example: "1090-io_bakef3d1_part_4_preprocess" CUE_FRAME_ID - The frame guid from cue3 CUE_IFRAME - The frame number CUE_JOB - The job name CUE_JOB_ID - The job guid from cue3 CUE_FRAME_ID - The frame guid from cue3 CUE_LAYER - The layer name CUE_LAYER_ID - The layer guid from cue3 CUE_LOG_PATH - The path to the file directory CUE_MEMORY - Amount of memory in KB assigned to the frame CUE_RANGE - The full frame range of the layer CUE_SHOT - The shot CUE_SHOW - The show

I've also attached the tutorial for historic reference.

tutorial.html.zip

csunitha commented 4 years ago

curious to know if this issue is still open to work upon. I would be interested in taking it up - but i am a first timer in open source contribution - so would need some help to start with.

sharifsalah commented 4 years ago

This particular tutorial is very outdated, but in general, yes I think documentation and tutorials in particular about PyOutline development would be welcome.

If you're new to open source, consider first becoming familiar with the tools and processes by contributing a simper and smaller change. For example, you might consider something related to https://github.com/AcademySoftwareFoundation/opencue.io/issues/193, which involves some refactoring of existing content.

csunitha commented 4 years ago

Thank you for your response. I would see if i can add value to that open item.