[RFC] Unitary Synthesis

khalatepradnya commented 8 months ago

Describe the feature

Problem

Given a user provided arbitrary quantum unitary, synthesize it into a sequence of quantum gates.

Expectations

User provides an arbitrary unitary matrix as a custom quantum operation.
The custom operation can be used as a regular CUDA-Q supported quantum operation.
- Q: Broadcast (same operation on multiple qubits): Out of scope
The allowed set of quantum gates for synthesis depends on the backend target.
- Q: Allow user to specify set of allowed gates: Out of scope
CUDA-Q throws error if a unitary cannot be synthesized (reasonably).
- 'reasonably' to account for time limit (timeout), gate count limit (upper threshold), and how close the synthesized "circuit" is to the input unitary (tolerance)
Parameterized custom operations will be covered in a follow-up RFC.

User API

Python
```
import cudaq 
```

cudaq.register_operation("custom_h", 1. / np.sqrt(2.) * np.array([[1, 1], [1, -1]])) cudaq.register_operation("custom_x", np.array([[0, 1], [1, 0]]))

@cudaq.kernel def bell(): qubits = cudaq.qvector(2) custom_h(qubits[0]) custom_x.ctrl(qubits[0], qubits[1])

counts = cudaq.sample(bell) counts.dump()


- C++

// Macro to specify the custom unitary operation cudaq_register_operation(custom_h, 1, 0, (std::vector<std::vector<std::complex>>{ {M_SQRT1_2, M_SQRT1_2}, {M_SQRT1_2, -M_SQRT1_2}})); cudaq_register_operation( custom_x, 1, 0, (std::vector<std::vector<std::complex>>{{0, 1}, {1, 0}}));

void custom_operation() qpu { cudaq::qvector qubits(2); custom_h(qubits[0]); custom_x.ctrl(qubits[0], qubits[1]); }

int main() { auto result = cudaq::sample(custom_operation); std::cout << result.most_probable() << '\n'; return 0; }



- The user must provide valid unitary matrix (CUDA-Q will not check / enforce this requirement)
- Ordering: The user provided matrix must be in row-major format
- Endianness: The user provided matrix is interpreted as Big-endian (often followed by Physics textbooks).

#### Constraints
- Size of unitary matrix: limit to 8 qubits, (2^8 = 256), 256 x 256
- The custom operation must be defined outside of a quantum kernel. (for e.g. call to `register_operation` cannot be inside a function decorated with `@cudaq.kernel`) 
- The tolerance for the synthesized circuit and the gate count limit will be default values determined by CUDA-Q
- The custom operation definition is restricted to `qubit` (`cudaq::qudit<2>`).

#### Workflow 
<img width="492" alt="image" src="https://github.com/NVIDIA/cuda-quantum/assets/148914294/d85a93b0-df46-4d9b-add0-72ee32014265">

 - In simulation, no synthesis will happen.
 - Compiler will automatically synthesize the matrix when targeting hardware.
 - Explicit synthesis mechanism (API or command-line argument) - Out of scope for the first iteration
 - NVQC target behaves same as when running locally

#### Work items / TO-DOs
- [x] Support in simulation for Python - 
  - [x] Kernel mode
  - [x] Builder mode
  - [x] State vector simulators
  - [x] Tensornet simulators
- [x] Support in simulation for C++ 
  - [x]  Library mode
  - [x] MLIR mode
- [x] Add generic synthesis for emulation
- [x] Error handling: Gracefully handle user errors, feature constraints and runtime errors
- [ ] Support synthesis per hardware backend
- [ ] ~~Comprehensive documentation and useful example(s)~~: Covered in issue #2002

schweitzpgi commented 6 months ago

Specifically, I'm not entirely sure what the following code's intended semantics is.

cudaq_register_op("custom_h",
                  {{M_SQRT1_2, M_SQRT1_2}, {M_SQRT1_2, -M_SQRT1_2}});
cudaq_register_op("custom_x", {{0, 1}, {1, 0}});

These are calls? Macros? What exactly is being registered? And with what?

These aren't marked as __qpu__ code so will be entirely opaque to the compiler at first blush. Hence, the compiler cannot generate quake code for them.

schweitzpgi commented 6 months ago

Second order question: it may be possible for the compiler to take a constant matrix here and generate a gate list (approximation) from those values. Or perhaps this should be generated entirely in the control hardware at QIR time? And what about the synthesis case? If the compiler is going to generate the gate list, it stands to reason that it will need to do so at synthesis-time. And that affects the IR, which would need to support dynamic matrix specifications that can be instantiated by the synthesizer.

khalatepradnya commented 6 months ago

These are calls? Macros? What exactly is being registered? And with what?

Macros. Updated the code snippet in description.

ACE07-Sev commented 5 months ago

Will this PR provide unitary decomposition like what qiskit's transpile does? https://github.com/NVIDIA/cuda-quantum/pull/1781

khalatepradnya commented 4 months ago

Will this PR provide unitary decomposition like what qiskit's transpile does? #1781

Conceptually, yes. However, the synthesis mechanism and target gateset will be implicit in this iteration.

ACE07-Sev commented 2 months ago

Question. I am trying to compare the depth of cuda-quantum's implementation of QSD (I assume it's QSD) vs Qiskit's implementation. May I ask how I can see the depth of the circuit in terms of U3 and CX gates?

khalatepradnya commented 2 months ago

Question. I am trying to compare the depth of cuda-quantum's implementation of QSD (I assume it's QSD) vs Qiskit's implementation. May I ask how I can see the depth of the circuit in terms of U3 and CX gates?

Thank you for the question. This feature is not yet available in CUDA-Q. I will update this issue when it becomes available.

NVIDIA / cuda-quantum