Closed mtreinish closed 3 years ago
After a quick check, it looks like there are about 400 instances of some form of node.op
in the code. I assume changing those to node._op
is preferable to reverting #6199. If the change is preferred, I could take this on.
After a quick check, it looks like there are about 400 instances of some form of
node.op
in the code. I assume changing those tonode._op
is preferable to reverting #6199. If the change is preferred, I could take this on.
If possible, I'd prefer to only access node._op
in the cases where we know node.op
is causing a performance problem. Even then, this might be better fixed by restructuring the interface so access to a private attribute isn't needed, even for performance. (Maybe redefine node.op
to be Optional[Instruction]
that's not None
if and only if node.type == 'op'
, or separate sub-classes for IODAGNode
and OpDagNode
. This still isn't great, but is maybe a step in the right direction.)
That said, I'm a little confused. node.op
as an @property
has existed since DAGNode
was introduced in #1815 . #6199 added an @property
for DAGNode.name
.
That's my fault for a lack of precision in opening the issue. #6199 introduced a regression around DAGNode.name
access, if you look at the benchmark I linked in the issue it's for cx cancellation which will be calling DAGNode.name
for each node in the dag as part of: https://github.com/Qiskit/qiskit-terra/blob/main/qiskit/transpiler/passes/optimization/cx_cancellation.py#L30 which is why we see the regression there in the nightly benchmarks
In the issue I was conflating that with the changes made in #6433 which were about DAGNode.op
and DAGNode.qargs
which were the same root cause function call overhead and were probably potential bottlenecks we've had sitting around for some time and just never noticed before.
Maybe @IvanIsCoding can comment here about his profiling on the collect_2q_blocks pass to show the overhead from the function call.
I agree with @kdk that we probably only want to do that in performance critical parts, but not generally. For example, in the circuit drawers (assuming it's used there, which I think it is) it wouldn't make sense to change anything.
So in #6199, we deprecated the use of passing a name
when instantiating a DAGNode
. We then added the property approach to accessing the name to point it to _op.name. It looks like there are fewer than 100 cases of node.name
used, mostly in the passes
. How about if we change those node.name
's to node._op.name
since that's where it's pointing to anyway.
That's my fault for a lack of precision in opening the issue. #6199 introduced a regression around
DAGNode.name
access, if you look at the benchmark I linked in the issue it's for cx cancellation which will be callingDAGNode.name
for each node in the dag as part of: https://github.com/Qiskit/qiskit-terra/blob/main/qiskit/transpiler/passes/optimization/cx_cancellation.py#L30 which is why we see the regression there in the nightly benchmarksIn the issue I was conflating that with the changes made in #6433 which were about
DAGNode.op
andDAGNode.qargs
which were the same root cause function call overhead and were probably potential bottlenecks we've had sitting around for some time and just never noticed before.Maybe @IvanIsCoding can comment here about his profiling on the collect_2q_blocks pass to show the overhead from the function call.
I agree with @kdk that we probably only want to do that in performance critical parts, but not generally. For example, in the circuit drawers (assuming it's used there, which I think it is) it wouldn't make sense to change anything.
I can jump in an say that in my particular case at #6433, @property
is used just for the setter and not for the getter. Hence, doing node.op
and node._op
are equivalent. This might not be as straightfoward in this case.
The speedup by replacing op
/_op
and qargs
/_qargs
at collect_2q_blocks
comes from:
_qargs
/_op
are heavily optimized because of the use of __slots__
. That is the fastest attribute look up you can do in Python!qargs
/op
are not optimized because there is @property
, hence there is a function call which slows thingsMy advice is to benchmark and see if the gains in th specific case are worth it. In collect_2q_blocks
, they were. Counting appearance of node.name
is a good starter guess, but mind you those don't account if they're in loops that executed many times or if statements that are never reaached.
Out of curiosity, I ran the following through timeit
:
class C():
def __init__(self):
self.foo = 'bar'
@property
def property_foo(self):
return self.foo
@property
def cond_property_foo(self):
if not True or 'true' != 'true':
raise ValueError()
return self.foo
time per call | without slots | with slots |
---|---|---|
c.foo |
35.4 ns ± 1.5 ns | 26 ns ± 0.197 ns |
c.property_foo |
119 ns ± 3.56 ns | 107 ns ± 2.5 ns |
c.cond_property_foo |
159 ns ± 24 ns | 140 ns ± 2.86 ns |
(N.B. Python 3.6 on OSX 10.15 )
At least for this microbenchmark, it seems in general like __slots__
saves 10-15 ns per call, @property
costs ~80 ns, and the condition checking another 30-40 ns per call. If these are attributes that are accessed frequently enough in a loop (and if these numbers are representative), this could add up to the observed regression.
That said, my preference here would be to either:
_
attributes only in the cases where we know there is an active performance concern (with a comment explaining why we're using the private attribute),DAGNode
(either via subclasses for IO
and Op
types, or by allowing .name
,.op
,... to be None
) so that the @property
and explicit type check aren't necessary.That we have a performance concern over convenience code that wraps and re-raises an AttributeError
as a QiskitError
suggests to me that there's room for improvement in the design. In general, it would be good to avoid building interfaces that are fast only if the consumer knows how to hold them in the right way, when we can make interfaces that are intentionally fast (as fast as python can be) by design.
Personally I think the subclass approach makes the most sense. That way we can avoid the property altogether. Having the explicit type
attribute always seemed a bit weird.
Information
What is the current behavior?
After PR #6199 merged a regression was flagged on our nightly benchmarks for certain transpiler passes where we're accessing a dag node's op frequently. For example:
https://qiskit.github.io/qiskit/#passes.PassBenchmarks.time_cx_cancellation?machine=dedicated-benchmarking-softlayer-baremetal&os=Linux%204.15.0-46-generic&ram=16GB&p-n_qubits=5&p-depth=1024&commits=3c979ebd
As was pointed out in #6433 this is caused by python function call overhead from using
@property
especially compared to a slotted attribute access that was there before..Steps to reproduce the problem
Run a transpiler pass or any dagnode operation that frequently uses
dagnode.op
What is the expected behavior?
No performance regression.
Suggested solutions
Either revert the change or change our internal dagnode usage to use the
_op
attribute on access for performance critical code (which is part of what #6433 does for one pass).