QuantumBFS / Yao.jl

Extensible, Efficient Quantum Algorithm Design for Humans.
https://yaoquantum.org
Other
928 stars 123 forks source link

How to run a random circuite in a batched way? #485

Open zipeilee opened 11 months ago

zipeilee commented 11 months ago

I wand to run a random unitary circuite in so many instance (like 1000 instance) and return the averge value like :

reg = zero_state(1)
mean([expect(Z, reg |> dispatch!(Rx,:random)) for _ in 1:1000])

but I want run it in a batched way, I know 幺 has batchedarrayreg, but I don't know how to performance random circuit in each instance to a batchedreg in a batched way. I tried

reg = zero_state(1,nbatch=1000)
expect(Z, reg|>dispatch!(Rx,:random))

but it seems not work, it just will pick one random instance 1000 times. What is the correct way?

GiggleLiu commented 11 months ago

Unfortunately, there is no easy way to do that. You need to copy reg, because |> changes the state inplace.

julia> sum([expect(Z, copy(reg) |> dispatch!(Rx(0),rand()*2π)) for _ in 1:1000])/1000
0.027340815055604813 + 0.0im
zipeilee commented 11 months ago

In fact, I hope to use the parallel computing of the GPU for batch processing. But this does not seem to be a good use of the parallel computing of the GPU.

GiggleLiu commented 11 months ago

I see. In your case, I would suggest you writing a new kernel, since this features is not supported by Yao yet.

  1. define a new gate type with batched parameters.
  2. dispatch the gate to the correct instruct! function. The current single parameter rotation gate calls into this implementation: https://github.com/QuantumBFS/CuYao.jl/blob/05f365f8f8e49fa2787df50a6e2226f508c94d80/src/instructs.jl#L19 You need to implement a new CUDA kernel (check bellow), it should not be too difficult if you know CUDA programming.

Hint of rewriting this instruct

instruct!(::Val{2}, state::DenseCuVecOrMat, U0::AbstractMatrix, locs::NTuple{M, Int}, clocs::NTuple{C, Int}, cvals::NTuple{C, Int})
  1. The Val{2} means it is for qubit, rather than qudit.
  2. The state is a vector or matrix as the register storage.
  3. U0 is the gate matrix. In your case, you need to input a rank-3 tensor, and each batch stores a 2x2 matrix. In different CUDA thread, you should use different matrix.
  4. locs is the locations that this bit applies on. For single qubit gate, it should only contain one element.
  5. clocs and cvals should be empty tuple in the absense of control bits.

Please feel free to ask if you encounter any issue.

zipeilee commented 11 months ago

Thanks for your patience and guidance! Actually, I need compute a chain block with such one qubit gates layer and two qubits gate layer in many qubits. You give me a good advice, I will try it.