PennyLaneAI / catalyst

A JIT compiler for hybrid quantum programs in PennyLane
https://docs.pennylane.ai/projects/catalyst
Apache License 2.0
138 stars 35 forks source link

The speedup using GPU over CPU execution seems unexpectedly low #1189

Open mtarabkhah opened 1 month ago

mtarabkhah commented 1 month ago

Issue description

I am benchmarking quantum circuits using Catalyst on a GPU. However, the speedup over CPU execution seems unexpectedly low.

Platform info: Linux-6.8.0-45-generic-x86_64-with-glibc2.39 Python version: 3.12.4 Numpy version: 1.26.4 Scipy version: 1.12.0 Installed devices:

Source code and tracebacks

I have provided 2 sample code with more information on the execution times in Catalyst-GPU-QS Repo

Additional information

Here are some sample execution times for a 26-qubit GHZ circuit:

This results in a 4.66x speedup with the GPU version, which seems relatively low for GPU acceleration.

For comparison, running this quantum circuit in Qiskit yielded the following:

P.S. I have not used for loops in the creation of the circuits, as I am using code to automatically generate the circuits based on a provided list of gates. I assume this should not affect the performance.

josh146 commented 1 month ago

Hi @mtarabkhah! Thanks for the benchmarking information.

Catalyst doesn't yet support lightning.gpu, but this in work in progress and coming shortly. I'm curious how you benchmarking Catalyst with GPU support?

josh146 commented 1 month ago

P.S. I have not used for loops in the creation of the circuits, as I am using code to automatically generate the circuits based on a provided list of gates. I assume this should not affect the performance.

Note that Catalyst compatible for loops (either using qml.for_loop, or qjit(autograph=True)) will in fact lead to an increase in performance, as the circuit will have a compressed representation :)

mtarabkhah commented 1 month ago

Hi @josh146,

Thanks for your reply.

I'm currently using lightning.gpu from PennyLane for the GPU version. Is there another way to use Catalyst for GPU execution?

Could you please review the provided code and suggest ways to improve performance, particularly using Catalyst on GPU?

I also appreciate the comment about "Catalyst-compatible for loops" and will look into that.