feltech commented 1 year ago

What

Document the usage of _CreateIdentifier / _CreateIdentifierForNewAsset during stage composition and identify how best to map possible combinations of assetized and non-assetized inputs to output using OpenAssetIO

Why

_CreateIdentifier is called frequently during USD stage composition. It is not entirely clear how its behaviour should map to OpenAssetIO concepts, and when and how we must support pass-through to the default resolver.

In particular, _CreateIdentifier accepts two inputs, an assetPath and an anchorAssetPath, either of which can be an entity reference or a file path, and produces one output, which may be either an entity reference or a file path. _CreateIdentifierForNewAsset is similar, but obviously for entities that have not yet been created.

Ultimately we may support querying the manager with both inputs to produce an output entity reference, but for the initial MVP this likely won't be possible, so an appropriate error/fallback mode should be established.

feltech commented 1 year ago

Summary

The problem of how to handle CreateIdentifier is slippery. The idea of OpenAssetIO resolving entity references to "the string you would have had in there before" doesn't quite work, since the unique identifier of an asset in Ar2 can be a composite of two references (paths), i.e. parent and child, where the parent is itself (potentially) a composite, and so on.

Overall, of the Options presented below, I lean toward Option 2. That is, simply don't support composite (i.e. relative) paths via entity references for now, and instead at some point in the future implement an analogous getRelatedReferences workflow.

Options

Option 1

If the location of the entity at assetPath is an absolute path (whether via entity reference or raw file path), then we should return assetPath unmodified. This allows Resolve to further mutate the entity reference and allows GetAssetInfo to query the manager with a meaningful entity reference.

If the location of the entity at assetPath is a relative path and the location of the entity at anchorAssetPath is a relative path, then we can either

Error.
Return assetPath unmodified, optimistically assuming the subsequent Resolve will remedy the situation.
Pass through to ArDefaultResolver, on the understanding that the (anchored) path must be found in one of the search paths, rather than being an absolute path. This is a deviation from ArDefaultResolver's standard behaviour.

Otherwise, if the location of the entity at assetPath is a relative path, then we should pass-through to the ArDefaultResolver. This has the downside that the subsequent Resolve, OpenAsset and (in particular) GetAssetInfo calls only ever see file paths, so the manager is not involved.

Once getRelatedReferences is available from C++, we can use that mechanism to allow the manager to mutate the assetPath based on anchorAssetPath. The logic described above can be prefixed with this.

Option 2

Return assetPath unmodified if it's an entity reference, otherwise pass-through to ArDefaultResolver alongside (resolved) anchorAssetPath.

This means we do not support assetPaths (or anchorAssetPaths) that resolve to a relative path. However, a future getRelatedReferences workflow could provide an analog.

Option 3

Pass-through to the ArDefaultResolver (after resolving assetPath) in all cases.

This has the downside that all subsequent Resolve, OpenAsset and (in particular) GetAssetInfo calls only ever see file paths, so the manager is not involved. It also precludes the mysterious IsContextDependentPath and IsRepositoryPath code paths in the layer registry cache.

It has the benefit of fully supporting the ArDefaultResolver. In particular, there is no issue with anchorAssetPath being a relative path, since it can't ever be relative with this approach, the anchorAssetPath will always be an absolute file path.

Background

`ArDefaultResolver` behaviour

Resolve checks if the path exists, either relative to cwd, relative to search path, or absolute. If it doesn't exist then returns an empty path. If it does exist then returns the absolute path to the discovered file.

CreateIdentifier

If given a file-relative (i.e. .//../) or absolute assetPath, will return the (anchored) path.
Otherwise checks if the anchored path Resolves (i.e. exists on disk at some search path). If so returns the anchored path, otherwise returns the (non-anchored) assetPath verbatim.

Code paths

There are two key code paths that trigger CreateIdentifier.

The workhorse is the SdfLayer::FindOrOpen code path. This is called by the initial call to Usd.Stage.Open (Python) i.e. UsdStage::Open (C++).

For the initial UsdStage::Open, CreateIdentifier is called with the assetPath set to the path we pass to the Open function verbatim, and a blank anchorAssetPath.

FindOrOpen is also called when loading subLayers and references, but only after an initial SdfComputeAssetPathRelativeToLayer code path.

The SdfComputeAssetPathRelativeToLayer code path is called through different routes for subLayers and references, but with a similar effect.

In this code path only CreateIdentifier is called (i.e. none of the other resolver methods are called). It is given the resolved reference to the parent USD as the anchorAssetPath and the path in the USD file as the assetPath. The result of this CreateIdentifier call is then fed pretty much immediately into SdfLayer::FindOrOpen.

This means the SdfComputeAssetPathRelativeToLayer code path consumes the anchorAssetPath, so that when CreateIdentifier is called for the second time, in the FindOrOpen code path, the assetPath is already resolved (and hence the anchorAssetPath is left as the default, blank).

In FindOrOpen, the result of CreateIdentifier is fed to Resolve and the result of that then fed to OpenAsset. GetAssetInfo is also called, and is passed the results of CreateIdentifier and Resolve as the assetPath and resolvedPath, respectively.

FindOrOpen, as the name suggests, first checks if the layer is available in the Sdf_LayerRegistry cache. This cache is keyed by a concatenation of the CreateIdentifier result and any arguments, to form (confusingly) an "identifier" string. Actually it's a bit more complex (see layerRegistry.cpp), and makes use of IsContextDependentPath and IsRepositoryPath methods of the resolver, but out of scope for this investigation.

feltech commented 1 year ago

After discussion, https://github.com/OpenAssetIO/usdOpenAssetIOResolver/issues/8 updated to reflect Option 2.

OpenAssetIO / usdOpenAssetIOResolver

Investigate CreateIdentifier workflow #11

What

Why

Summary

Options

Option 1

Option 2

Option 3

Background

`ArDefaultResolver` behaviour

Code paths

OpenAssetIO / usdOpenAssetIOResolver

Investigate CreateIdentifier workflow #11

What

Why

Summary

Options

Option 1

Option 2

Option 3

Background

ArDefaultResolver behaviour

Code paths

`ArDefaultResolver` behaviour