Open rht opened 9 months ago
Thanks @rht. This is useful info and squeezing the most performance out of the lightning suite is a top priority!
For the 0.35 release (end of Feb), we're planning to move towards lightning.gpu
tapping into our new JVP/VJP pipeline by default. At that point, I'd be curious to compare these numbers again. We'll keep you posted as we make progress on this.
Issue description
Here is a concrete benchmark result for #562, where I found the hybrid Python-C++ code (~40 LOC of the Python code) in that PR to be ~2x faster than the full C++ code for adjoint Jacobian. I think this means that there could be a room for improvement for the C++ implementation.
I'm raising this issue because I had spent time figuring out about speeding up the adjoint code. I think it would be a missed opportunity if this were not examined. A review of this issue would be appreciated!
The elapsed times to consider are the "elapsed grad final version" and "Elapsed diagonal observable grad only".
Code to reproduce the benchmark result.