NVIDIA / cuQuantum

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples
https://docs.nvidia.com/cuda/cuquantum/
BSD 3-Clause "New" or "Revised" License
320 stars 63 forks source link

Releasing `qsim_mgpu` source on GitHub instead of only binaries the Docker container #106

Open basnijholt opened 6 months ago

basnijholt commented 6 months ago

Dear cuQuantum developers,

After long struggles in trying to get qsim to compile with multi-GPU support, I found out that one cannot accomplish this using stock qsim. IMO this wasn't very clear from the documentation.

Ultimately, I found out by running the NVIDIA cuQuantum Appliance Docker container and checking the git diff --no-index between the original qsim_simulator.py and the ~/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/qsimcirq/qsim_circuit.py file in the Docker container.

It turns out that this changed version uses a module qsim_mgpu which appears to be unavailable outside the said Docker container.

Unfortunately, my current software stack cannot integrate Docker containers (we need full control over the entire environment), yet the multi-GPU support offered by qsim_mgpu is essential for our work. We would greatly benefit from the ability to compile qsim_mgpu independently.

Could you please consider releasing the modified qsim code/fork with qsim_mgpu for standalone use? This would be immensely beneficial for us and potentially for others in the community facing similar challenges.

If there are reasons for keeping this code exclusive to the Docker environment, understanding them could help us explore alternative solutions.

Tagging the core maintainers: @leofang @ahehn-nv @mtjrider @Takuma-Yamaguchi

mtjrider commented 4 months ago

Unfortunately, my current software stack cannot integrate Docker containers (we need full control over the entire environment), yet the multi-GPU support offered by qsim_mgpu is essential for our work. We would greatly benefit from the ability to compile qsim_mgpu independently.

@basnijholt can you clarify your needs/requirements surrounding the following statement? ... my current software stack cannot integrate Docker containers (we need full control over the entire environment) ...

The default user within the container is a member of the sudo group. It should be possible to configure the environment however you want (or to build an entirely new image).

From the software section of the NGC landing page:


Default user environment

The default user in the container is cuquantum with user ID 1000. The cuquantum user is a member of the sudo group. By default, executing commands with sudo using the cuquantum user requires a password which can be obtained by reading the file located at /home/cuquantum/.README formatted as {user}:{password}.

To acquire new packages, we recommend using conda install -c conda-forge ... in the default environment (cuquantum-23.10). You may clone this environment and change the name using conda create --name {new_name} --clone cuquantum-23.10. This may be useful in isolating your changes from the default environment.

CUDA is available under /usr/local/cuda, a symbolic directory managed by update-alternatives. To query configuration information, use update-alternatives --config cuda.

basnijholt commented 4 months ago

Thanks for your reply @mtjrider!

The main problem is that I would like to install a fully locked environment from a conda-lock.yml file and this is just not possible.

Installing the precise versions of the packages we require, requires surgical interventions like removing all packages except qsim and any packages it's built with (like numpy). Then we need to maintain a requirements file that hard pins exactly the numpy and Python versions (perhaps other hard linked packages?) used in the cuQuantum Appliance.

I just tried to download nvcr.io/nvidia/cuquantum-appliance to count the number of dependencies but the repo is down currently (another reason why separate distribution would be great).

Other reasons why this Docker route is suboptimal:

Finally, another problem is that the Qiskit version that is shipped inside the cuQuantum Appliance 23.10 is broken and does not run noisy simulations correctly.

Verify with the following snippet. (Click to expand) ```python import qiskit import qiskit_aer # Build circuit qc = qiskit.QuantumCircuit(1) qc.x(0) qc.measure_all() # Build noise model noise_model = qiskit_aer.noise.NoiseModel() noise_model.add_all_qubit_quantum_error(qiskit_aer.noise.depolarizing_error(1.0, 1), ['x']) # Simulate circuit with noise sim = qiskit_aer.AerSimulator(method='statevector', device='GPU') job = sim.run(qc, shots=1024, noise_model=noise_model) counts = job.result().get_counts() print(f"Measured counts {counts}.\nExpected result is a roughly even mixture between 0 and 1.") ``` which, in the cuQuantum container, yields ``` Measured counts {'1': 1024}. Expected result is a roughly even mixture between 0 and 1. ```

All of the above described issues are easily resolved when just publishing NVIDIA's fork of qsim, which allows installing it like any other package.