ShuffleAttributes node for renaming existing attributes

GafferHQ / gaffer

Gaffer is a node-based application for lookdev, lighting and automation

http://www.gafferhq.org

BSD 3-Clause "New" or "Revised" License

961 stars 207 forks source link

ShuffleAttributes node for renaming existing attributes #3611

Closed andrewkaufman closed 4 years ago

andrewkaufman commented 4 years ago

Summary

A ShuffleAttributes node should provide the ability to rename several attributes at all locations matching a filter. The data of these attributes should not change, its just a name change; the new attribute should be replacing the original.

User story

What

I want to rename several attributes at various locations in my scene.

Why

We have a use case where animators want to control scene:visible in order to make things disappear in their dailies. They do this by baking the attribute into a scene cache file at specific locations. We want to control if/when this data travels on to downstream departments, as in some cases they are really animating the "performance" of scene:visible for downstream consumption.

When it should not travel downstream, we're going to have them set a different attribute (eg user:preview:visible) instead. With a ShuffleAttributes node we'd be able to rename that to scene:visible only when appropriate (ie. only in Anim renders and not in Lighting renders).

Feature proposal

The ShuffleAttributes node could have a simple From and To column based UI, with a :heavy_plus_sign: to add rows for new attributes. Open to other suggestions though.

Bonus points if it can list existing attributes at the filtered locations in the context menu for easy entry.

johnhaddon commented 4 years ago

It's interesting that you said "RenameAttributes or ShuffleAttributes", as I was wondering about more general renaming using patterns and such (such as prefixing everything with "anim:" or removing an "anim:" prefix). We do have a request for a node to rename scene locations too, and although it's not really fleshed out, it seems likely it'll be pattern based, so I think perhaps "Rename" nodes should be pattern based and "Shuffle" nodes should be simple copy-this-to-that nodes like the Shuffle we already have for channels?

From your description though, it's not quite copy-this-to-that like the channel shuffle, but more move-this-to-that, is in you do want to remove the source attribute as well?

The ShuffleAttributes node could have a simple From and To column based UI, with a heavy_plus_sign to add rows for new attributes. Open to other suggestions though.

That sounds pretty reasonable to me. It'd be nice to look at what we can share with the existing Shuffle node on the API side of things.

andrewkaufman commented 4 years ago

It's interesting that you said "RenameAttributes or ShuffleAttributes"

Yeah, I changed that at the last minute... I wasn't sure if my desire to remove the original was inline with Shuffling...

From your description though, it's not quite copy-this-to-that like the channel shuffle, but more move-this-to-that, is in you do want to remove the source attribute as well?

Yes, I had intended it to be a replacement rather than a copy. Though I guess that's not actually essential for my immediate use case. I suppose I could just use a downstream DeleteAttributes if I really want the original to be removed, but that feels like unnecessary hash/compute churn for such a simple operation...

I wonder if a Mode column could make sense, similar to Mode on a TweaksPlug but with Copy and Rename (and maybe even Prefix) options? Our internal ShufflePrimitiveVariables node does provide both copy and rename functionality.

johnhaddon commented 4 years ago

I wonder if a Mode column could make sense, similar to Mode on a TweaksPlug but with Copy and Rename (and maybe even Prefix) options?

Yeah, sounds good. Tom and I went backwards and forwards a bit on this today, but in the end I think we're both thinking the same as you. So, just to check we're all on the same page :

We call it ShuffleAttributes.
It uses a ShufflePlug thingy that is pretty similar to the existing TweaksPlug. It has string plugs for an input and output name, and an int plug to specify a mode.
The only two modes for now are Copy and Rename.
Open questions :
- How to deal with wildcards/suffixes/prefixes/search+replace. A few options :
  - More modes. Pros : quite easy to use, quite easy to implement. Cons : quite rigid in terms of what transformations are available. Potentially lots of extra child plugs on every ShufflePlug. Continual feature creep.
  - Wildcards and captures on parts of the input name, and string substitutions using the captures in the output name. Pros : more flexible. Cons : less discoverable, less friendly for many users.
  - Wildcards and captures hidden behind a fancy authoring UI. Pros : possibly the best of both worlds. Cons : tricky design and implementation.
- What to do with the existing Shuffle node. It seems it wouldn't be too hard to refactor so it uses the ShufflePlug too, but maybe that's not something you have time for?
- ShufflePrimitiveVariables. I know you have this already at IE, but it'd be great to have a Gaffer version using the same ShufflePlug stuff as ShuffleAttributes. Having another use case with another container type (PrimitiveVariableMap rather than CompoundObject) might help validate the design.

johnhaddon commented 4 years ago

Couple of further thoughts around fancier renaming capabilities :

I'm not particularly keen on capture patterns and substitutions that reference them. We made a conscious decision to use glob-style matching rather than regexes because although it is less powerful, it is much simpler, and more likely to be familiar to our users (through use in the shell). Capture patterns seem to be straying back the wrong way.
I think it's worth considering a ${name} variable that can be used to reference the source name in the destination field. This could potentially be used a couple of ways :
- We could extend the substitution syntax very slightly to allow python-style slicing. So ${name}[3:] could be used to strip a prefix, and myPrefix${name} could be used to add one. I don't want the substitution syntax to become too much of a mini-language of its own, but this seems to add quite a lot of functionality for a small extension, and would be familiar to our python-savvy users.
- The ${name} variable could be an actual context variable, and then users could use an expression to do any kind of fancy renaming they could imagine.

Like I said, I don't think we need to rush into these additional features for a first version, but I wanted to keep the conversation ticking along now its started.

andrewkaufman commented 4 years ago

ShuffleAttributes... uses a ShufflePlug... The only two modes for now are Copy and Rename.

Sounds good.

I think it's worth considering a ${name} variable that can be used to reference the source name in the destination field.

That sounds pretty nice.

We could extend the substitution syntax very slightly to allow python-style slicing.

To clarify, are you suggesting some hardcoded logic in the ShufflePlug, or something new in IECore::StringAlgo, or some new boost::spirit gubbins?

What to do with the existing Shuffle node. It seems it wouldn't be too hard to refactor so it uses the ShufflePlug too, but maybe that's not something you have time for?

@danieldresser was of the opinion that if we're touching the Shuffle node, we should give it a bigger overhaul (eg to support layers better). I'm not sure I have time for all that... But if you think its gonna be simple I could give it a shot.

ShufflePrimitiveVariables. I know you have this already at IE, but it'd be great to have a Gaffer version using the same ShufflePlug stuff as ShuffleAttributes.

Yeah, we've wanted to put that one the outside for a while now. Maybe its the right time. I'd certainly rather take this on than the image shuffle overhaul.

Similarly, we could add ShuffleOptions or ShuffleSets

andrewkaufman commented 4 years ago

Hmm, our internal ShuffleSets node has a 3rd mode: merge (for when the destination set already exists)... not that we need to get into it now, but it brings up a question about ShufflePlug modes. Should we have a registration mechanism so specific nodes can add bespoke modes? Or should it be an enum with specific nodes blacklisting values they don't support?

johnhaddon commented 4 years ago

I think it's worth considering a ${name} variable that can be used to reference the source name in the destination field. That sounds pretty nice.

Ran this by a couple of artists today and they were receptive, but pointed out that it doesn't support search+replace in any way. It seems that a simple SearchAndReplace mode might still be necessary either way if we don't want to force people off into expression land.

To clarify, are you suggesting some hardcoded logic in the ShufflePlug, or something new in IECore::StringAlgo, or some new boost::spirit gubbins?

I was thinking of something new in IECore::StringAlgo. I am worried about pushing that too far in the "mini language" direction though, so if SearchAndReplace is a mode anyway, I'd probably drop the idea.

@danieldresser was of the opinion that if we're touching the Shuffle node, we should give it a bigger overhaul (eg to support layers better). I'm not sure I have time for all that... But if you think its gonna be simple I could give it a shot.

I guess you could shuffle layers with layerName.*, SearchAndReplace, layerName, newLayerName, but I'm not sure if that's something a compositor would enjoy. Perhaps there's even a case for separate ShuffleChannels and ShuffleLayer nodes?

Yeah, we've wanted to put that one the outside for a while now. Maybe its the right time. I'd certainly rather take this on than the image shuffle overhaul.

I'd take a ShufflePrimitiveVariables node in trade for a todo on the Shuffle refactor. I'm not familiar with the IE one, but I'd want the Gaffer one to follow the pattern we're establishing for ShuffleAttributes.

Hmm, our internal ShuffleSets node has a 3rd mode: merge (for when the destination set already exists)... not that we need to get into it now, but it brings up a question about ShufflePlug modes. Should we have a registration mechanism so specific nodes can add bespoke modes? Or should it be an enum with specific nodes blacklisting values they don't support?

Good question. It also makes me realise that the separation of Copy and Rename modes kindof fall apart when you introduce SearchAndReplace or Merge modes. Just because I want to use SearchAndReplace to define my destination, doesn't mean I don't want control over the potential deletion of the source. If we want to keep such mode options open, should we instead have deleteSource BoolPlug?

andrewkaufman commented 4 years ago

I'd take a ShufflePrimitiveVariables node in trade for a todo on the Shuffle refactor.

Great

I'm not familiar with the IE one, but I'd want the Gaffer one to follow the pattern we're establishing for ShuffleAttributes.

Yeah, I'd assumed. Ours is basically the same with a worse UX. We have 2 string plugs "copy" and "rename" and we use a source:destination this:that foo:bar syntax.

Just because I want to use SearchAndReplace to define my destination, doesn't mean I don't want control over the potential deletion of the source. If we want to keep such mode options open, should we instead have deleteSource BoolPlug?

Makes sense, and that is essentially what we have on our ShuffleSets (source name, dest name, merge bool, keepSource bool).

So if Copy and Rename were the only modes we'd planned to implement to start with, but we also know we want a deleteSource bool, now we really only have one mode to start with.... bit odd, no?

johnhaddon commented 4 years ago

So if Copy and Rename were the only modes we'd planned to implement to start with, but we also know we want a deleteSource bool, now we really only have one mode to start with.... bit odd, no?

I think it implies that we no longer have a mode, at least not in this first version. But now I think about it, a hypothetical Merge mode doesn't fit well with a hypothetical SearchAndReplace mode either. I think there are two orthogonal things here 1) a mechanism to define the destination name from the source name (verbatim, search/replace, prefix/suffix, substitutions) 2) a mechanism to define what we do with the names once we've got them (copy one to the other, move one to the other, merge one into the other). And now we're back to a Copy/Rename mode? I dunno any more...

Question about your experience using the merge mode for sets : does it usually occur that you want to either merge for all the source/dest pairs or none of them, or do you want to merge some and not others? I'm wondering if ShufflePlug should deal only with defining source/dest and delete/merge/whatever should be defined by the node itself.

andrewkaufman commented 4 years ago

Question about your experience using the merge mode for sets : does it usually occur that you want to either merge for all the source/dest pairs or none of them, or do you want to merge some and not others?

Unfortunately that node only supports shuffling one set at a time.... Its a utility node we made for the internals of our SceneBuilder, and we didn't expose it in the menus or put much thought into the UX. The SceneBuilder uses it in 2 ways:

once with merging and deleting the original (combining the maya and non-maya light sets into an allLights set)
once without merging and keeping the original (copying a specific LOD set to a standard name).

I suspect if it supported multiple sets, the merge would be for some and not others. For the example above we could have used one node instead of two.

I think there are two orthogonal things here

It might be more than two.... Dan points out that Replace/Merge go together, but deleteSource might be desirable with either of those... 2 modes and a toggle per attribute?

NamingScheme : Verbatim | Search/Replace | Prefix | Suffix | Substitutions
ConflictResolution : Replace | Merge
deleteSource : bool

I'm wondering if ShufflePlug should deal only with defining source/dest and delete/merge/whatever should be defined by the node itself.

Could be. I'd guess for ShuffleAttributes and ShufflePrimitiveVariables we'll have some duplicate code, but not a ton, so maybe it doesn't matter.

johnhaddon commented 4 years ago

This is getting a bit out of hand, and I think Merge is largely to blame. Shuffling is about moving a distinct piece of data from one name to another, not about arbitrary operations that merge two data sources into one. If we want to perform operations between sets, we could do a much better job in a dedicated node, where set expressions would give us far more flexibility than a simple replace/merge. Or we could even consider allowing a set expression on the source side of a ShuffleSets (although that doesn't play well with deleteSource).

andrewkaufman commented 4 years ago

Based on some offline discussion, we're going to drop the notions of merging and search/replace for now, and just focus on the basics. We'll still allow for the ${name} idea John proposed (without slicing for now) to cover the obvious prefix/suffix cases (though maybe calling it ${source} would be better to match the source & deleteSource children of ShufflePlug).

I've sketched out an implementation for ShufflePlug, ShufflesPlug, ShuffleAttributes, and ShufflePrimitiveVariables. Here is a proposed UI for ShuffleAttributes (same applies to the other shuffle nodes):

shuffleUI

For the API, applying shuffles gets a bit fiddly, because we have a handful of similar-but-different data structures. They can all be thought of as key/value pairs, but the types vary (eg CompoundObject, CompoundData, PrimitiveVariableMap) and in some cases the keys and values are stored separately and computed on demand (eg channelNames()/channelData( channel )).

I think we can cover all the anticipated use cases with something like the following on ShufflesPlug:

/// Shuffle application
/// =================

using ShufflesMap = std::unordered_map<std::string, std::string>;
using InternedShufflesMap = std::unordered_map<IECore::InternedString, IECore::InternedString>;

// compute a mapping from sources to targets

ShufflesMap shuffleSources( const std::vector<std::string> &sources ) const;
InternedShufflesMap shuffleSources( const std::vector<IECore::InternedString> &sources ) const;

// convenience methods to apply the shuffle maps for common data structures

template<typename T>
static void applyShuffles( const ShufflesMap &shuffles, std::map<std::string, T> &data );

template<typename T>
static void applyShuffles( const InternedShufflesMap &shuffles, std::map<IECore::InternedString, T> &data );

Maybe I could drop the std::string vs InternedString variants and force InternedString, assuming in the future we'd be using them everywhere anyways?

This does cause a bit of wasted data for several common cases (eg CompoundObject, CompoundData, PrimitiveVariableMap) because we have to construct the extra sources vector... unless there is some modern c++ magic to iterate on just the keys of a map pair?

Alternatively, to aid those cases, we could provide additional convenience methods to apply the shuffles directly without creating the intermediate sources vector and mapping:

template<typename T>
void applyShuffles( std::map<std::string, T> &data ) const;

template<typename T>
void applyShuffles( std::map<IECore::InternedString, T> &data ) const;

andrewkaufman commented 4 years ago

The corresponding client code in ShuffleAttributes would be

    CompoundObjectPtr result = inputAttributes->copy();

    std::vector<InternedString> sources;
    for( auto &kv : result->members() )
    {
        sources.push_back( kv.first );
    }

    auto mapping = shufflesPlug()->shuffleSources( sources );
    ShufflesPlug::applyShuffles<ObjectPtr>( mapping, result->members() );

    return result;

    CompoundObjectPtr result = inputAttributes->copy();
    shufflesPlug()->applyShuffles<ObjectPtr>( result->members() );

    return result;

andrewkaufman commented 4 years ago

After sleeping on it, I'm not sure the ShufflesMap is the right structure to be returning... I'd originally thought nodes like ShuffleChannels and ShuffleSets would need to cache a map between resolved source/target keys (for use in computeChannelData and computeSet respectively), but those nodes are also going to need a vector of resolved targets to return from computeChannelNames and computeSetNames... so maybe we ought to do something like this instead:

// compute a mapping from sources to targets. the returned targets vector will have the same length as the input sources, with any deleted sources represented as empty strings in the targets

std::vector<IECore::InternedString> shuffleSources( const std::vector<IECore::InternedString> &sources ) const;

// convenience methods to apply the shuffles to common data structures

template<typename T>
static void applyShuffles( const std::vector<IECore::InternedString> &sources, const std::vector<IECore::InternedString> &targets, std::map<IECore::InternedString, T> &data );

And then nodes like ShuffleChannels/ShuffleSets can cache the targets vector on a private InternedStringVectorDataPlug and re-fetch the sources from their in plug during computeChannelData/computeSet.

Thoughts?

andrewkaufman commented 4 years ago

Hmm, actually that caveat about empty targets would be rather annoying for computeChannelNames to have to filter out...

johnhaddon commented 4 years ago

using ShufflesMap = std::unordered_map<std::string, std::string>;
using InternedShufflesMap = std::unordered_map<IECore::InternedString, >IECore::InternedString>;
// compute a mapping from sources to targets
ShufflesMap shuffleSources( const std::vector<std::string> &sources ) const;

I've got a variation on this first proposal that I think is worth considering. It's based on a few observations :

It's a pity to have to allocate and deallocate a std::vector<std::string> &sources and a ShufflesMap every time we need to do a shuffle.
As defined, ShufflesMap is specific to the particular list of sources, but it should be possible to make a more general mapping that can work for any list of sources. The Spreadsheet already uses something similarish internally. Because it's usable for all queries, it can be cached on an internal plug.
The mapping from source to target isn't one-to-one. The same source can be shuffled into multiple targets.

This is what I'm thinking :

/// In ShufflesPlug
/// ===============

struct Shuffler
{
    const vector<InternedString> &destinations( InternedString &source, bool &delete );
    private :
        MapPtr m_map;
};

/// Gets a shuffler for the current context. We're free to pull `Shuffler::m_map`
/// from an internal plug if we want to, or we could leave that as a future optimisation.
/// We're not tied to any particular data structure because it's all private.
Shuffler shuffler() const;

/// In ShuffleAttributes::computeAttributes()
/// =========================================

Shuffler shuffler = shufflesPlug()->shuffler();
CompoundObjectPtr result = new CompoundObject;
for( const auto &a : inputAttributes->members() )
{
    bool delete;
    for( const auto &d : shuffler.destinations( a.first, delete ) )
    {
        result->members()[d] = a.second;
    }
    if( !delete )
    {
        result->members()[a.first] = a.second;
    }
}

There are bunch of details of the internal implementation to be fleshed out there, but I think it's workable? The computeAttributes() code is simple enough that I'd be happy to use that without any extra convenience functions specialised for particular types of container. If we really wanted them though, we could always add things like Shuffler::shuffle( CompoundObjectMap &m ).

After sleeping on it, I'm not sure the ShufflesMap is the right structure to be returning...

It's definitely a good point that ShuffleSets and ShuffleChannels don't have the same needs as ShuffleAttributes and ShufflePrimitiveVariables. The key difference is that computeChannel() is starting from a target name, and needing to work back to get a source name. I think storing a mapping tailored to the each particular set of sourceNames is probably worth it in that case. But since these aren't our first use cases, and the Shuffler I proposed gets to hide its implementation , I'd be fairly happy to cross that bridge later. It may well be that it's just another method on the Shuffler to return the reverse mapping. Does that sound reasonable, or do you think we need to flesh it out more to be sure?

andrewkaufman commented 4 years ago

It's a pity to have to allocate and deallocate a std::vector<std::string> &sources and a ShufflesMap every time we need to do a shuffle.

Yeah, I don't like those much either. FWIW, my initial implementation to get going and play with a functional node did not have either of those, it just ran in-place on the CompoundObjectMap. I added the ShufflesMap and input vector in an attempt to make an API that could be workable for the more complex nodes (eg ShuffleChannels, ShuffleSets).

it should be possible to make a more general mapping that can work for any list of sources. The Spreadsheet already uses something similarish internally

Cool, thanks, I'll take a look at that.

The mapping from source to target isn't one-to-one. The same source can be shuffled into multiple targets.

Ah, crud, I missed that case in my tests... I think it was not a problem with my original implementation, only with the updated 'should work for the complex nodes' API...

Shuffler

I can give this a shot. It would be good to flesh out those implementation details though. Dan metioned you have ideas about caching, special cases for non-wildcard / non context variables shuffles, etc. Worth a hangouts chat?

BTW, I noticed you've used destinations rather than targets. Would you like me to rename the ShufflePlug::targetPlug() and update the UI to use destination instead?

The computeAttributes() code is simple enough that I'd be happy to use that without any extra convenience functions specialised for particular types of container.

Its a bit of a shame though, it'll be duplicated almost exactly in at least 4 nodes eventually (ShuffleAttributes, ShufflePrimitiveVariables, ShuffleOptions, ShuffleMetadata).

I'm happy to add Shuffler::shuffle( CompoundObjectMap &m ) and others, but what about Shuffler::shuffle( IECoreScene::PrimitiveVariableMap &m ) in a namespace I can't touch? That's why I went for the templated method above.

Its also harder to write test cases for ShufflesPlug without actually being able to apply the shuffles. Speaking of which, seems like we should bind the Shuffler struct?

good point that ShuffleSets and ShuffleChannels don't have the same needs as ShuffleAttributes and ShufflePrimitiveVariables.... storing a mapping tailored to each particular set of sourceNames is probably worth it in that case. But since these aren't our first use cases... I'd be fairly happy to cross that bridge later... Does that sound reasonable, or do you think we need to flesh it out more to be sure?

I'm in 2 camps here I suppose.

On the one hand, if we're happy to say we don't need to flesh out those more complex nodes now, then I'm not sure we even need the Shuffler now... my first implementation hid it all away behind ShufflesPlug::shuffle( someMapTypeWithStringKeys ) and didn't require the intermediate map/vector. I can push (but not pr) a WIP version of that if you want to see the nitty-gritty.

On the other hand, if we suspect those complex nodes will want to use the Shuffler eventually, then maybe its worth fleshing them out. It'd be a bit silly to make the Shuffler now only to re-write it in a few months.

On the mysterious third hand, I'm under some time pressure, and the basic node I actually need is fully functional, so I'm not going to get a ton of leeway for additional design time... If you're positive the Shuffler is the right way to go, and we can flesh it out fast, I'm all for it. If we want more time to consider, it might be prudent to scrap all this and chuck it in a private repo for now.

johnhaddon commented 4 years ago

The mapping from source to target isn't one-to-one. The same source can be shuffled into multiple targets. Ah, crud, I missed that case in my tests... I think it was not a problem with my original implementation, only with the updated 'should work for the complex nodes' API...

The other potential curveball I didn't mention is that GafferImage::Shuffle has "virtual" sources like __white and __black. I think that could probably work in any of these schemes, and since it's limited to that one node maybe isn't something we should make too central to the design.

Shuffler I can give this a shot. It would be good to flesh out those implementation details though. Dan metioned you have ideas about caching, special cases for non-wildcard / non context variables shuffles, etc.

I think it'd be a lot like that internal spreadsheet map. There'd be a sourceName->destinationNames map for the exact matches, and a list of wildcardedName->destinationNames for the wildcards. I just realised the flaw in the sketch I presented though - if we want to cache the map, it needs to be cached on another plug, which means that the node needs to be responsible for computing it. All doable, but a bit less clean than what I was suggesting.

BTW, I noticed you've used destinations rather than targets. Would you like me to rename the ShufflePlug::targetPlug() and update the UI to use destination instead?

I hadn't done that consciously, but now you mention it, I do think source/destination are a better pair for a shuffle than source/target. Just looking through the existing nodes, target seems to have only been used as "the thing to aim at". GafferImage::Shuffle is using in/out though. We can conform that to source/destination later if we want, but in/out seems worth considering.

I'm happy to add Shuffler::shuffle( CompoundObjectMap &m ) and others, but what about Shuffler::shuffle( IECoreScene::PrimitiveVariableMap &m ) in a namespace I can't touch? That's why I went for the templated method above.

Ah, I hadn't spotted the templating - that makes perfect sense. Can we go one step further though, and template on the container type rather than the value type? Then we could shuffle unordered_map, flat_map, map etc.

if we're happy to say we don't need to flesh out those more complex nodes now, then I'm not sure we even need the Shuffler now... my first implementation hid it all away behind ShufflesPlug::shuffle( someMapTypeWithStringKeys ) and didn't require the intermediate map/vector.

Starting with just a minimal ShufflesPlug::shuffle() sounds reasonable to me. It seems like it wouldn't be hard to provide backwards compatibility for that method if we later decide we want to add a Shuffler and some caching. And even with that method alone, it's still entirely possible to implement ShuffleSets.

If we want more time to consider, it might be prudent to scrap all this and chuck it in a private repo for now.

Having got this far, I think that would be a great pity.

Worth a hangouts chat?

I'm free at 10am Vancouver time if you want to finalise things with a quick chat, but I'd be equally happy to see a PR with just ShufflesPlug::shuffle() in it, if that's your preference.

andrewkaufman commented 4 years ago

As discussed offline, we'll avoid the Shuffler for now and go for the simpler implementation. If we find some production use cases which would benefit from the optimization, we can revisit. It should hopefully be fairly straightforward to provide backward compatibility with the ShufflesPlug::shuffle() method.