Workareas - Githubissues

getavalon / core

The safe post-production pipeline - https://getavalon.github.io/2.0

MIT License

218 stars 49 forks source link

Goal

Implement the Rythm & Hues concept of "workareas".

https://vimeo.com/116364653

Table of Contents

Motivation
Relationships
Space Preservation
Implementation

Motivation

Currently, every asset is encapsulated by a single top-level directory in which both development and public files reside. That's great, as it means there is never any duplication of an asset anywhere on disk and related material is easily found.

What R&H have done however is take this concept one step further.

They've managed to encapsulate not only the asset, but also application dependencies to any task - such as rigging their tiger - into a single folder. This folder then only contains things related to rigging the tiger.

└── ProjectFolder
    └── workareas
       ├── 1000
       ├── 2000
       ├── PiPatel
       └── RichardParker
           ├── modeling
           ├── lookdev
           └── rigging
               ├── v01
               └── v02
                   ├── input
                   ├── output
                   ├── nuke
                   ├── houdini
                   └── maya

Things to note

Workareas contain published content
Workareas contain copies (symlinks) of loaded content
Workareas are versioned, as well as published assets

In practice what this means is that all paths used within e.g. Maya are local to the current working directory and that shipping this directory to another computer is a mere copy/paste. References, textures and caches all reference the local working directory hence no additional logic, conversion or dependency tracking is necessary.

It also means sending a shot or asset to a farm for rendering or additional processing is dead simple.

Relationships

Here you can see that assets are actually stored inside of the work folder, in the output/ directory. The same then goes for assets being used within a work folder. It's all local.

Relationships are then tracked per workarea, as opposed to between individual assets, the workarea being the top-level component of a project.

Space Preservation

What baffled me at first was how they accomplished this without huge amount of data duplication. How can they avoid having multiple copies of the same version of their tiger model in each of the task folders that used it? How do they manage to keep track of updates of this model? This was the genius that enabled this level of specificity and it isn’t complicated.

Note that (1) all assets and shots are stored together under workareas/, (2) tasks are stored directly under a given asset (as opposed to under a dedicated work/ and publish/ folder), (3) task areas are versioned and (4) inside each task there is an input/ and output/ directory. This is where things get interesting.

The output/ contains data produced within a given workarea, such as the rig produced in RichardParker/rigging. To avoid the aforementioned problem of data duplication, these outputs are symlinked into another workarea.

Windows Example

set output=%cd%\RichardParker\rigging\output\default\v001
set input=%cd%\1000\animation\input\rigging
mkdir %output% %input%
mklink /J %input%\default %output%

Result

├───1000
│   └───animation
│       └───input
│           └───rigging
│               └───default
└───RichardParker
    └───rigging
        └───output
            └───default
                └───v001

Note that v001 is symlinked directory into the name of the subset, default. R&H doesn't allow for multiple versions used within the same workarea.

These symlinks are used both to preserve disk space, but also to maintain a physical link between what data goes into a workarea, as well as what goes out of it.

└── output
    └── rigDefault
       ├── v001
       ├── v002
       └── v003
           ├── rigDefault.abc
           ├── rigDefault.skel
           └── rigDefault.ma

The beauty of this system is that now all data is tracked. Anything going into any asset or shot is physically tracked via a filesystem mechanism and workareas are the sole unit of work. All other benefits of our system, such as validating and guaranteeing a level of quality on output and relieving the artist from working with paths directly.

Implementation

In order for this to be made possible, a few things need to happen.

Creation, publishing and loading must be externalised in order to facilitate a workflow this different.
The path to any asset involves more information, see below.

The object model remains unaffected, and the Loader operates on this so loading and publishing assets will also remain unaffected.

Directory Layout

General

Workareas versioned, appplication folders intermixed with input and output folders.

└── ProjectFolder
    └── workareas
       ├── 1000
       ├── 2000
       ├── PiPatel
       └── RichardParker
           ├── modeling
           ├── lookdev
           └── rigging
               ├── v01
               └── v02
                   ├── input
                   ├── output
                   ├── nuke
                   ├── houdini
                   └── maya

Output

Include subset.

└── output
    └── default
       ├── v001
       ├── v002
       └── v003
           ├── default.abc
           ├── default.skel
           └── default.ma

Input

Version mapped directly to subset.

└── input
    └── RichardParker
        └── rigging
            └── default
                ├── default.skel
                └── default.ma

Paths

At the moment, all files are maintained via two compressed directory templates; one for input, and one for output, that vary per project.

work = "{root}/{project}/f02_prod/{silo}/{asset}/work/{task}/{user}/{app}"
publish = "{root}/{project}/f02_prod/{silo}/{asset}/publish/{subset}/v{version:0>3}/{subset}.{representation}"

The current available members are documented in the main documentation and included here for completeness.

Member	Type	Description
`{app}`	`str`	The current application directory name, defined in Executable API
`{task}`	`str`	Name of the current task
`{user}`	`str`	Currently logged on user (provided by `getpass.getuser()`)
`{root}`	`str`	Absolute path to root directory, e.g. `m:\f01_project`
`{project}`	`str`	Name of current project
`{silo}`	`str`	Name of silo, e.g. `assets`
`{asset}`	`str`	Name of asset, e.g. `Bruce`
`{subset}`	`str`	Name of subset, e.g. `modelDefault`
`{version}`	`int`	Number of version, e.g. `1`
`{representation}`	`str`	Name of representation, e.g. `ma`

For an R&H directory structure, we need three additional members along with one additional path.

app = "{root}/{project}/workareas/{asset}/{task}/{app}"
input = "{root}/{project}/workareas/{asset}/{task}/input/{input_asset}/{input_task}/{input_subset}/{input_representation}"
output = "{root}/{project}/workareas/{asset}/{task}/output/{asset}/{subset}/{representation}"

Member	Type	Description
`{input_asset}`	`str`	The input Asset, not necessarily the current asset
`{input_subset}`	`str`	The input Subset
`{input_representation}`	`str`	The input Representation

API

Another benefit to their layout is their workarea API.

Function	Description
`setup()`	Building the base level directory structure common to all workareas.
`version()`	Create a new version of the workarea which typically involves saving the current state of the working files and asset subscriptions as well as any required provisioning of the new version.
`copy()`	Used to create templates or as a means to clone a production workarea for debugging. This also handles copying relative asset subscriptions which makes it trivial to set-up a workarea on one shot and copy it to another and have it up and running immediately. This is extremely useful once a workarea is functional. (Looking at you @Stonegrund)
`backup()`	Compressing a specified version of a workarea and moving it to nearline storage.
`restore()`	Oppsite of above
`import()`	Sterilise and prep the incoming asset subscriptions for use within the workarea.
`register()`	Scan the workarea for files exported from content creation software. When found register them as assets as created from the current version of the workarea.
`transfer()`	Transfer the contents of a workarea from one studio location to another. This typically includes asset subscriptions which trigger additional syncing of those deliverables to ensure remote users have everything necessary to continue working.

More blue sky thoughts.

"Unified Pipeline" means something different to RH than it means for Dreamworks, what does it mean for us? To Dreamworks, it means having a separate pipeline per department.
R&H is said to have "a common vocabulary in each department" which simplifies the amount of information anyone managing the studio or team will need to know; speaking the same language with all artists regardless of department, such as "Can you publish that asset from shot 5?. Our language is Project, Asset, Subset, Version and Representation. What else?
Each work area creates a local copy of all required assets. Symlinking to optimise for space. What this means is that ultimately all paths used within a given workarea are relative the workarea. In the case of Mindbender, assets are referenced globally which is optimal in terms of disks pace utilisation (without symlinks) but sub-par in terms of portability, especially to rendering.
What would it mean to implement this at Mindbender? On load, an asset could be symlinked into the given workarea and referenced from there. No additional disks pace would be required as we can symlink it. Paths from here on out would be local to this directory which means copy() and sending to a farm would become trivial.
They also mentioned the ability to perform data transformation during this step, such as converting between various formats best suited for a given application within which it is to be used.
The cost? We will firstly require the ability to create symlinks, which is (1) hopefully supports by the file-system used by the NAS and (2) supported by the end-user workstation, a networked Windows machine. It may be the case that we'll need to set-up a server to take requests local to the NAS that could then produce symlinks.
In R&H "subscription" means "dependency". I like the word subscription better as it implies a voluntary or context sensitive dependency, rather than something an asset absolutely needs in order to function.
Perhaps most importantly, workareas may take the load off the file server due to all files being referenced from a single directory. If we symlink, then no performance is gained. But one could easily and transparently convert, or "bake" a given symlink, such as one for an image sequence, into the workarea itself. Hence relying on only one directory for file access.
The workarea then natively supports being taken off a network and stored locally, resulting in the full capacity of the local workstation applied to an artist.
Versioning a workarea means resetting a given version of the published asset?
Each version of a workarea is a "snapshot" of a previous version plus additions. Likely symlinking all immutable data and copying that which is to be modified, such as the scene file. Or maybe let the user create as new scene file by making previous version and symlinks to this version read only. This would save space and enable reliable tracking of workareas. RH did mention that an artist may work on multiple versions simultaneously. I wonder how that could work.
A work area can only be subscribed to a single version of an asset at any given time. This means that we can find out what assets asset being used where by looking at the symlinks and that artist would be intuitively responsible for updating their one version to the one the need actively use. We still can't tell whether a symlinked asset is actually used within the application, but doing so would be an optimisation and without should cover a majority of cases. It also means a slightly simpler representation of subscribed to assets on disk.
Building on top of the symlink directly however would mean enabling a bidirectional link between the path to a file and it's corresponding document in the db. Maintaining this link manually via the db likely isn't an issue anyway and would enable support for copying things into a work area without the use of sym links.
RH assets are imported into the workarea, e.g. via symlinks, and exported into the workarea as well. -/workarea/inputs/Bruce/modelDefault..
- /workarea/outputs/Bruce/rigDefault..
- /workarea/maya/scenes/myscene.ma
Hence they subscribe to the assets within a workarea hence implicitly subscribe to the parent workarea itself.
What is the cost?
1. No global or project asset database. For example, browsing for all character assets isn't possible as characters are created and published within individual work areas. With a database or dedicated tool for searching however this would still be possible.
2. Updating of a given asset is locked to a given workarea, e.g. cannot publish a new version of a model except from where the model was originally produced. This has been useful in practice (publishing a model from cfx at Framestore) but it is rare and I wouldn't be surprised if this alternate workflow wouldn't work as well.
The gain?
1. Conceptually simple, to both artists and developers
2. Localised, copy/pasting work becomes a plain filsystem operation (with exception for subscription updates)
R&H "Build" == Mindbender "Asset". Meaning they make a conceptual and likely implementation separation between shots and assets.
R&H "main" == Mindbender "default"
Assets with unique names. If every document in a database have a unique name, such as "Bruce_modelDefault_v001_ma" then searching for an asset by name becomes incredibly simple, at the cost of not being able to move an asset from one parent to another. Question is, the loose coupling between assets in this way is something I've encouraged for a long time but never had any ROI for. Is there room for savings here, or are there other benefits to this separation?
Pyblish plugins have data, e.g. families, let's store it in a data member, e.g. CollectInstances.data["families"]. That will open up for new potential in hidden and misc data members, in addition to the primary API.
Each workarea has custom plug-in and artist tools made available, such as character animation tools being available in an animation workarea but not to the lighting artist.
Is this a good idea? It essentially means locking down a given task and customising things towards it. Hard to tell, it would enable greater focus per artist at the cost of lost flexibility in terms of what an artist can do. Perhaps some have this, like animators whereas others don't, such as cfx where the need to be flexible is greater.