Closed BigRoy closed 5 years ago
Think assuming objectsSets as containers is fair enough. If this break anyone's pipeline, it will encourage a good discussion about it.
@davidlatwe could you give this a test run on your end, including testing whether "nested" containers still work as expected for you? I've tested these cases including with scrambled namespaces as I knew that could be a use case on your end (as such I kept the other optimization tests with cmds.ls
to do recursive querying). But since it's not an actual production case on our end I was unable to test a production example of that. My expectation is that it should still work, yet better safe than sorry.
Neat.
lsattr
too? Reduce a few lines?cmds.ls(type="objectSet")
too?Does this mean we can remove
lsattr
too? Reduce a few lines?
I believe so. I think lsattr
and lsattrs
in colorbleed.maya.lib
are now unused. Will check on Monday.
Note that this ups the minimum required Maya version to 2016, where this iterator was introduced.
Good point. If ever needed we can add a fallback to cmds.ls (type="objectSet")
Did you consider cmds.ls(type="objectSet") too?
Sure did. But somehow it still was slower in heavier scenes. As if it still tries to filter some garbage stuff. Will do another check on Monday.
Just tested it on my end with a big set dressing scene, there were 1279
containers and the total node count was 96463
. (Nothing broken :smile:)
Before: 5.43400001526 sec
After: 3.57499980927 sec
3.57499980927 sec
Interesting that it's still that slow @davidlatwe.
I wonder what the next bottleneck would be in that case. It must be the parsing of the container that uses lib.read
. If we allow parse_container
to do that itself through the API that can be optimized by a lot too, especially because we know which attributes belong to the container itself and reading only those string values. This would potentially even bring your scene to under a second.
Or, is your slowdown because of a custom config call within ls()
?
Should I look into optimizing this further?
Sorry for the delay, had an extra day off due to the weather here :cloud_with_lightning_and_rain:
I wonder what the next bottleneck would be in that case. It must be the parsing of the container that uses lib.read.
I have tested that if I remove the parse_container
call, the time drop to less then 1 sec.
So yeah, +1 for letting parse_container
to do itself via Maya API for the optimization.
Or, is your slowdown because of a custom config call within
ls()
?
No, I have changed the code here so that custom call was actually returning an empty dict
now.
But if the parse_container
changed to only read the default attributes defined in schema, then I will need the custom call to get my additional attributes. :relaxed:
But if the parse_container changed to only read the default attributes defined in schema, then I will need the custom call to get my additional attributes.
Haha. So in your case it's beneficial that it includes additional data not defined in the schema. What would be the verdict? Would we want parse_container
to solely return what is defined as attributes in the schema? @mkolar @tokejepsen @mottosso ?
Option 1)
Leave it as is.
I'm thinking of maybe leaving it as is for now as grabbing only the relevant attributes might be slow too with the backwards compatibility to container-1.0
that has got many more required attributes, of which schema
funnily isn't one.
Option 2)
First get the schema version, then get only those relevant attributes. 👍
We could maybe do a check on attribute schema
being present, if not then we assume schema container-1.0
. It would make the code quite verbose though:
def parse_container(node):
# psuedocode
fn = om.MFnDependencyNode(node)
container = dict()
# Get container attributes
plug = fn.findPlug("schema", True)
if not plug:
# assume container-1.0
container["schema"] = "container-1.0"
attributes = [
"id",
"name",
"author",
"loader",
"families",
"time",
"subset",
"asset",
"representation",
"version",
"silo",
"path",
"source"
]
else:
schema = plug.asString()
if schema == "container-2.0":
attributes = [
"schema",
"id",
"name",
"namespace",
"loader",
"representation"
]
else:
# Unknown container schema
return None
# Collect the values
for key in attributes:
plug = fn.findPlug(key, True)
if plug:
container[key] = plug.asString()
# `objectName` is special, it doesn't refer to an
# attribute but refers to the node name.
container["objectName"] = fn.name()
if validate:
schema.validate(container)
return container
Option 3)
Read the schema version from container and directly get the required attributes from the actual schema. For example:
import avalon.schema
schema = "container-2.0"
attributes = avalon.schema._cache[schema + ".json"]["required"]
print attributes
Which if we add a get()
function to schema.py
could become:
import avalon.schema
schema = "container-2.0"
attributes = avalon.schema.get(schema)["required"]
print attributes
So I did a quick speed comparison with the above Option 3 implemented with this code:
def parse_container(container, validate=True):
import avalon.schema
sel = om.MSelectionList()
sel.add(container)
dep_node = sel.getDependNode(0)
fn = om.MFnDependencyNode(dep_node)
data = dict()
# Get container schema version
if fn.hasAttribute("schema"):
plug = fn.findPlug("schema", True)
schema = plug.asString()
else:
# Backwards compatibility
schema = "container-1.0"
# Get attributes from schema
schema_doc = avalon.schema._cache.get(schema.rsplit(":", 1)[-1] + ".json",
None)
if not schema_doc:
# Unknown schema version. Ignore..
return {}
attributes = schema_doc["required"]
# Collect the values
for key in attributes:
if key == "objectName":
# `objectName` is special, it doesn't refer to an
# attribute but refers to the node name.
data["objectName"] = fn.name()
continue
plug = fn.findPlug(key, True)
if plug:
data[key] = plug.asString()
if validate:
schema.validate(data)
return data
I timed it in a scene with 12 containers, and did 100 iterations of ls()
with:
import time
s = time.time()
for x in range(100):
list(pip.ls())
e = time.time()
print e-s
The difference between old and new parse_container
:
Old: 1.92000007629
New: 1.77500009537
For a small scene it seems the speed difference is neglible. @davidlatwe could you try giving this a go with your heavy scene? With these current stats I must admit the more complex code here definitely is not worth it.
It actually seems that for many containers most time is spent on the schema.validate
in a very light scene, disabling that brings this down to 0.19s
for 100 iterations on the scene with 12 containers.
Another speed comparison in a heavier scene with 24 containers and many nodes! running 5 iterations of ls()
:
**NEW**
New with validate False
0.28200006485
New with validate True
0.430000066757
**OLD**
Old with validate False
0.309000015259
Old with validate True
0.477999925613
Parsing the container through the API doesn't seem to be worth it for these scenes with under a hundred containers. I wonder if it helps more in your scene with 1279
containers @davidlatwe but I think at this stage the more complex code using the API isn't justifiable.
Thanks !
So I have tested with my heavy scene, and :
parse_container
via API + validate=True
2.621 sec
parse_container
via API + validate=False
0.155 sec
Original parse_container
+ validate=False
0.713 sec
Looks like disabling validation is the best option here !
Does this mean we can remove lsattr too? Reduce a few lines?
@mottosso technically we could, however it's also exposed in __init__.py
of the host and as such is technically available in the API. It could be in use by someone, so I'll leave this PR as is and will not remove it. We could leave that up to a future PR if we want to clean up.
This PR is ready to be merged. 👍
Actually, could we add validate=True
kwarg to ls()
so to be able to opt-out validation when parsing containers ?
What, there's validation somewhere? :O I can't spot it. What kind of validation is happening?
What, there's validation somewhere? :O I can't spot it. What kind of validation is happening?
It's only the schema validation not a Pyblish validation. ;)
Actually, could we add validate=True kwarg to ls() so to be able to opt-out validation when parsing containers ?
Sure. Should I just make it ls(validate=True)
? Or does it need to be ls(**kwargs)
?
ls(validate=True)
would be good :D
ls(validate=True) would be good :D
Exposed the validate
toggle through ls()
with https://github.com/getavalon/core/pull/456/commits/f588483c338928bfc21f4ead2649435a19030328
Of course it's not Pyblish validation?
Would prefer removing the validation altogether... oh, and you already made the change. Please heavily discuss any change to the API. API changes are forever, ever ever. In this case, there must be a clear reason to add validate=True
; what's the usecase? What options have you tried, why didn't those work? When does it happen? Is it a problem or a convenience? Discuss.
Don't change the API, unless you discuss discuss discuss.
If you introduce an optional validation and it's enabled by default, does that mean we won't get the speed up benefit by using the GUI tools in Avalon?
Oh, and you already made the change.
Sure. Agree. ;) Maybe pushing ahead to rapidly with the commit. :) The change wasn't directly meant to be merged but to leave it in a state for the PR discussion to continue, but I guess it does create a bit of "noise" in the commits.
Would prefer removing the validation altogether
@mottosso This could be dangerous if by any chance someone accidentally created a set with the same .id
where they didn't intend too. I've had it happen only once but it was due to my own oversight. I guess we can also see that as ones' own care and remove the validation all together. If one wants to validate the container manually then it's trivial to do a validation manually with avalon.schema.validate(container)
. It's quite a speedup for heavy scenes if it's disabled so I'd be happy to remove it.
If you introduce an optional validation and it's enabled by default, does that mean we won't get the speed up benefit by using the GUI tools in Avalon?
@tokejepsen Correct. The UI would still use the slower version that is mostly noticeable with a very high amount of containers. The original optimization at the beginning of this PR however is there nonetheless. This validation is only another optimization.
The reason for removing the validation altogether currently is purely an optimization for heavy scenes. I guess we can have a little "vote" for removing it altogether? Is there anyone who can not live with the validation being removed from ls()
? @mkolar @tokejepsen @davidlatwe
Maybe revert the previous commit (f588483), merge, and open another issue ?
An issue to discuss whether to validate container data entries by default, or remove the validation entirely from parse_container
. And the discussion result should be apply to all hosts in same manner.
Maybe revert the previous commit (f588483), merge, and open another issue ?
I'll just revert the change either way, because initial consensus seems to be opposing the idea of it anyway. ;) Then whether we use this PR or another for removing the validation altogether I don't really mind with either. Feel free to merge once I reverted it and open a clear issue for the "validations". :)
I'll just revert the change either way
Thanks. It makes me very nervous that you're willing to make changes to the API so lightly. Any change to the API better have a darn good reason for happening. Anything else is for your own configs, not core.
;)
;)
Merging this !
Issue
In large scenes with many nodes the
host.ls()
query for host Maya is slower than it needs to be. I've seen cases where a single query took 3.7 seconds.What's changed?
The
_ls()
method that was used to just list all the relevant avalon containers in maya by node name was relatively slow in the way it queried it when used in large scenes. It previously did a full query on all nodes by an ".id" attribute whereas we should do it only overobjectSets
as those are the container nodes in the Maya pipeline. This has now been changed to using the Maya Python API 2.0 to iterate over all dependency nodes of typemaya.api.OpenMaya.MFn.kSet
Also, it did that full iteration twice, because it also had to look up for the backwards compatibility id
pyblish.mindbender.container
. This has now been changed to just detect whether the id value is any of the two directly as opposed to iterating the scene twice.Speed comparison
As I was developing an optimized method I've built some prototype functions as a replacement and compared timings in smaller and big scenes. In all cases I've found that this new implementation was much faster.
In my large test scene the difference was huge:
The newer method being faster also confirmed in a small test scene by @tokejepsen here:
The implementation in this PR is
New 2
in those tests.Fact: Containers are objectSets?
In the Maya pipeline in Avalon containers are created with the
containerise
function that is exposed in avalon core, this creates theobjectSet
that encapsulates the nodes that belong to specific loaded data. As such this implementation assumes that all nodes that we allow to be acontainer
actually is of typeset
because that's what Avalon core provides, plus it's what I've seen (logically) used throughout all available configs.However, do note, that if someone ever relied on something having the
.id
on a random node in the scene and get detected as a container that with this change it will not be detected if it's not anobjectSet
.