docker build doesn't work out of the box

rht commented 1 year ago

Issue description

I did a vanilla clone of the repo, and ran docker build . -f ./docker/Dockerfile -t "lightning-gpu-wheels". But failed with the following error:

Source code and tracebacks

#0 15.72 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
#0 15.96 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
#0 15.96 -- Looking for pthread_create in pthreads
#0 16.17 -- Looking for pthread_create in pthreads - not found
#0 16.17 -- Looking for pthread_create in pthread
#0 16.41 -- Looking for pthread_create in pthread - found
#0 16.41 -- Found Threads: TRUE  
#0 16.42 -- Found CUDA: /usr/local/cuda (found version "12.2") 
#0 16.43 -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.2.128") 
#0 17.05 -- Could NOT find Python (missing: Python_INCLUDE_DIRS Python_LIBRARIES Development Development.Module Development.Embed) (found version "2.7.5")
#0 17.05 CMake Error at CMakeLists.txt:176 (message):
#0 17.05   
#0 17.05 
#0 17.05   Unable to find cuQuantum SDK installation.  Please ensure it is correctly
#0 17.05   installed and available on path.
#0 17.05 
#0 17.05 
#0 17.05 -- Configuring incomplete, errors occurred!
#0 17.07 /pennylane-lightning-gpu/pyenv3.8/lib/python3.8/site-packages/setuptools/dist.py:463: UserWarning: Normalizing '0.32.0-dev' to '0.32.0.dev0'
#0 17.07   warnings.warn(tmpl.format(**locals()))
#0 17.07 Traceback (most recent call last):
#0 17.07   File "setup.py", line 143, in <module>
#0 17.07     setup(classifiers=classifiers, **(info))
#0 17.07   File "/pennylane-lightning-gpu/pyenv3.8/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
#0 17.07     return distutils.core.setup(**attrs)
#0 17.07   File "/opt/_internal/cpython-3.8.17/lib/python3.8/distutils/core.py", line 148, in setup
#0 17.07     dist.run_commands()
#0 17.07   File "/opt/_internal/cpython-3.8.17/lib/python3.8/distutils/dist.py", line 966, in run_commands
#0 17.07     self.run_command(cmd)
#0 17.07   File "/opt/_internal/cpython-3.8.17/lib/python3.8/distutils/dist.py", line 985, in run_command
#0 17.07     cmd_obj.run()
#0 17.07   File "/pennylane-lightning-gpu/pyenv3.8/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
#0 17.07     _build_ext.run(self)
#0 17.07   File "/opt/_internal/cpython-3.8.17/lib/python3.8/distutils/command/build_ext.py", line 340, in run
#0 17.07     self.build_extensions()
#0 17.07   File "/opt/_internal/cpython-3.8.17/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
#0 17.07     self._build_extensions_serial()
#0 17.07   File "/opt/_internal/cpython-3.8.17/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
#0 17.07     self.build_extension(ext)
#0 17.07   File "setup.py", line 85, in build_extension
#0 17.07     subprocess.check_call(
#0 17.07   File "/opt/_internal/cpython-3.8.17/lib/python3.8/subprocess.py", line 364, in check_call
#0 17.07     raise CalledProcessError(retcode, cmd)
#0 17.07 subprocess.CalledProcessError: Command '['cmake', '/pennylane-lightning-gpu', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/pennylane-lightning-gpu/build/lib.linux-x86_64-3.8/pennylane_lightning_gpu', '-DPYTHON_EXECUTABLE=/pennylane-lightning-gpu/pyenv3.8/bin/python3', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-GNinja', '-DCMAKE_MAKE_PROGRAM=/pennylane-lightning-gpu/pyenv3.8/bin/ninja', '-DENABLE_OPENMP=OFF', '-DENABLE_CLANG_TIDY=0']' returned non-zero exit status 1.

ikurecic commented 1 year ago

Thanks for the note, @rht ! We'll check it out and get back to you soon.

rht commented 1 year ago

I fixed the cuquantum not found by removing --no-deps and instead doing pip install cuquantum in https://github.com/PennyLaneAI/pennylane-lightning-gpu/blob/1e129b2e7dbc7d885b16da61b6c5b1a02e45970d/docker/Dockerfile#L18. However, subsequently, I encountered lots of compile error, for example

#0 101.4 /pennylane-lightning-gpu/pennylane_lightning_gpu/src/algorithms/AdjointDiffGPU.hpp:370:76: error: could not convert ‘{<expression error>, <expression error>, <expression error>, <expression error>, <expression error>}’ from ‘<brace-enclosed initializer list>’ to ‘Pennylane::Pennylane::Algorithms::OpsData<double>’
#0 101.4   370 |         return {ops_name, ops_params, ops_wires, ops_inverses, ops_matrices};
#0 101.4       |                                                                            ^
#0 101.4       |                                                                            |
#0 101.4       |                                                                            <brace-enclosed initializer list>
#0 101.4 /pennylane-lightning-gpu/pennylane_lightning_gpu/src/algorithms/AdjointDiffGPU.hpp: In instantiation of ‘void Pennylane::Pennylane::Algorithms::AdjointJacobianGPU<T>::batchAdjointJacobian(const CFP_t*, int, int) [with T = double; Pennylane::Pennylane::Algorithms::AdjointJacobianGPU<T>::CFP_t = double2]’:
#0 101.4 /pennylane-lightning-gpu/pennylane_lightning_gpu/src/algorithms/AdjointDiffGPU.cpp:5:39:   required from here
#0 101.4 /pennylane-lightning-gpu/pennylane_lightning_gpu/src/algorithms/AdjointDiffGPU.hpp:441:66: error: ‘jac_local’ was not declared in this scope; did you mean ‘dt_local’?

rht commented 1 year ago

I additionally had to specify Python_SITELIB to point to the virtualenv site-packages path.

mlxd commented 1 year ago

Hi @rht

Thanks for posting. We haven't been using the docker builder process for some time as we run our own custom AMIs now through Github Actions (https://github.com/PennyLaneAI/pennylane-lightning-gpu/blob/main/.github/workflows/build_wheel_manylinux2014.yml). We will need some time to investigate what changes are needed to get this process back working, but I suspect the issue is a combination of compiler versions, changing dependencies, and updated C++ language features.

rht commented 1 year ago

Yeah, I managed to make it work by consulting the GH Actions yml file. One difference is that manylinux2014 uses Red Hat Toolset 10, which works with the Docker image, but this is different from the GH Actions file, which uses g++-11 and gcc-11.

My changes in the CMakelists.txt:

used find_package (Python3 COMPONENTS Interpreter Development.Module) (i.e. Python3 instead of Python, Development.Module instead of Development)
added set(Python_SITELIB /pennylane-lightning-gpu/pyenv3.8/lib/python3.8/site-packages) (hardcoded to 3.8 because I wanted quick result ASAP)

In the Dockerfile, I replaced yum -y install cuda with yum -y install cuda-11-5.

With those changes, everything should work.

CatalinaAlbornoz commented 1 year ago

Hi @rht, I'm glad you managed to make it work! Thank you for sharing your solution here. Please let us know if you encounter any further issues.

PennyLaneAI / pennylane-lightning-gpu

docker build doesn't work out of the box #131

Issue description

Source code and tracebacks