Closed HuangJiaLian closed 1 year ago
The error seems to be indicating that you don't have an OpenCL device driver installed (and the simulation code is written in OpenCL). If you have some OpenCL capable device (a GPU) on the system, then the ocl-icd-system
package in conda should detect the driver automatically.
But sometimes that might not work on some super computers etc., in which case I have just manually copied the driver file to the conda environment (run while conda environment is activated):
cp /etc/OpenCL/vendors/nvidia.icd $CONDA_PREFIX'/etc/OpenCL/vendors/'
That's for Nvidia. Depending on what you have on your system, you might have to check what other *.icd
files are present in /etc/OpenCL/vendors/
.
Thank you for the explanation. I will try to learn more about how to use GPU on the Triton HPC cluster to handle this problem later.
I tried to login into a Tesla V100 GPU node, and some other errors occurred.
(base) [huangj4@login3 huangj4]$ srun -p interactive --gres=gpu:1 --constraint=volta --time=2:00:00 --mem=6000M --pty bash
(base) [huangj4@gpu32 Graph-AFM]$ conda activate graph-afm
(/scratch/work/huangj4/.conda_envs/graph-afm) [huangj4@gpu32 Graph-AFM]$ cd scripts/
(/scratch/work/huangj4/.conda_envs/graph-afm) [huangj4@gpu32 scripts]$ ls
generate_data.py predict_random.py slurm-17989007.out test.py train.py
predict_examples.py slurm-17988316.out submit.sh train_distributed.py
(/scratch/work/huangj4/.conda_envs/graph-afm) [huangj4@gpu32 scripts]$ python generate_data.py
PACKAGE_PATH = /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/pyProbeParticle
CPP_PATH = /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/cpp
OCLEnvironment platform[0] PACKAGE_PATH: /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/pyProbeParticle
i_platform 0
3 errors generated.
Traceback (most recent call last):
File "generate_data.py", line 46, in <module>
oclr.init(env)
File "/scratch/work/huangj4/Github/Graph-AFM/scripts/../ProbeParticleModel/pyProbeParticle/RelaxOpenCL.py", line 52, in init
cl_program = env.loadProgram(env.CL_PATH+"/relax.cl")
File "/scratch/work/huangj4/Github/Graph-AFM/scripts/../ProbeParticleModel/pyProbeParticle/oclUtils.py", line 30, in loadProgram
program = cl.Program(self.ctx, fstr ).build()
File "/scratch/work/huangj4/.conda_envs/graph-afm/lib/python3.8/site-packages/pyopencl/__init__.py", line 534, in build
self._prg, was_cached = self._build_and_catch_errors(
File "/scratch/work/huangj4/.conda_envs/graph-afm/lib/python3.8/site-packages/pyopencl/__init__.py", line 582, in _build_and_catch_errors
raise err
pyopencl._cl.RuntimeError: clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE
Build on <pyopencl.Device 'Tesla V100-PCIE-32GB' on 'NVIDIA CUDA' at 0x557be9ed4a10>:
<kernel>:377:42: error: cannot assign to variable 'dpos0_' with const-qualified type 'const float4' (vector of 4 'float' values)
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~^
<kernel>:377:18: note: variable 'dpos0_' declared const here
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~~~~^~~~~~~~~~~~
<kernel>:404:42: error: cannot assign to variable 'dpos0_' with const-qualified type 'const float4' (vector of 4 'float' values)
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~^
<kernel>:404:18: note: variable 'dpos0_' declared const here
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~~~~^~~~~~~~~~~~
<kernel>:481:42: error: cannot assign to variable 'dpos0_' with const-qualified type 'const float4' (vector of 4 'float' values)
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~^
<kernel>:481:18: note: variable 'dpos0_' declared const here
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~~~~^~~~~~~~~~~~
(options: -I /scratch/work/huangj4/.conda_envs/graph-afm/lib/python3.8/site-packages/pyopencl/cl -I/scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/cl -I/scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/cl)
(source saved as /tmp/tmpejtowm75.cl)
(/scratch/work/huangj4/.conda_envs/graph-afm) [huangj4@gpu32 scripts]$ cp /etc/OpenCL/vendors/nvidia.icd $CONDA_PREFIX'/etc/OpenCL/vendors/'
(/scratch/work/huangj4/.conda_envs/graph-afm) [huangj4@gpu32 scripts]$ ls $CONDA_PREFIX'/etc/OpenCL/vendors/'
nvidia.icd ocl-icd-system
(/scratch/work/huangj4/.conda_envs/graph-afm) [huangj4@gpu32 scripts]$ python generate_data.py
PACKAGE_PATH = /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/pyProbeParticle
CPP_PATH = /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/cpp
OCLEnvironment platform[0] PACKAGE_PATH: /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/pyProbeParticle
i_platform 0
3 errors generated.
Traceback (most recent call last):
File "generate_data.py", line 46, in <module>
oclr.init(env)
File "/scratch/work/huangj4/Github/Graph-AFM/scripts/../ProbeParticleModel/pyProbeParticle/RelaxOpenCL.py", line 52, in init
cl_program = env.loadProgram(env.CL_PATH+"/relax.cl")
File "/scratch/work/huangj4/Github/Graph-AFM/scripts/../ProbeParticleModel/pyProbeParticle/oclUtils.py", line 30, in loadProgram
program = cl.Program(self.ctx, fstr ).build()
File "/scratch/work/huangj4/.conda_envs/graph-afm/lib/python3.8/site-packages/pyopencl/__init__.py", line 534, in build
self._prg, was_cached = self._build_and_catch_errors(
File "/scratch/work/huangj4/.conda_envs/graph-afm/lib/python3.8/site-packages/pyopencl/__init__.py", line 582, in _build_and_catch_errors
raise err
pyopencl._cl.RuntimeError: clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE
Build on <pyopencl.Device 'Tesla V100-PCIE-32GB' on 'NVIDIA CUDA' at 0x563a0dcd0fb0>:
<kernel>:377:42: error: cannot assign to variable 'dpos0_' with const-qualified type 'const float4' (vector of 4 'float' values)
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~^
<kernel>:377:18: note: variable 'dpos0_' declared const here
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~~~~^~~~~~~~~~~~
<kernel>:404:42: error: cannot assign to variable 'dpos0_' with const-qualified type 'const float4' (vector of 4 'float' values)
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~^
<kernel>:404:18: note: variable 'dpos0_' declared const here
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~~~~^~~~~~~~~~~~
<kernel>:481:42: error: cannot assign to variable 'dpos0_' with const-qualified type 'const float4' (vector of 4 'float' values)
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~^
<kernel>:481:18: note: variable 'dpos0_' declared const here
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~~~~^~~~~~~~~~~~
(options: -I /scratch/work/huangj4/.conda_envs/graph-afm/lib/python3.8/site-packages/pyopencl/cl -I/scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/cl -I/scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/cl)
(source saved as /tmp/tmpxxiyd91v.cl)
I don't think I need to change the codes itself. It's probably because I use some libraries in the wrong version. So I change the opencl to the version in 2021 by using
pip install pyopencl==2021.2.6
and load the cuda version as described in environment.yml
module load cuda/11.3.1
(/scratch/work/huangj4/.conda_envs/graph-afm) [huangj4@gpu32 scripts]$ python generate_data.py
PACKAGE_PATH = /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/pyProbeParticle
CPP_PATH = /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/cpp
OCLEnvironment platform[0] PACKAGE_PATH: /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/pyProbeParticle
i_platform 0
3 errors generated.
Traceback (most recent call last):
File "generate_data.py", line 46, in <module>
oclr.init(env)
File "/scratch/work/huangj4/Github/Graph-AFM/scripts/../ProbeParticleModel/pyProbeParticle/RelaxOpenCL.py", line 52, in init
cl_program = env.loadProgram(env.CL_PATH+"/relax.cl")
File "/scratch/work/huangj4/Github/Graph-AFM/scripts/../ProbeParticleModel/pyProbeParticle/oclUtils.py", line 30, in loadProgram
program = cl.Program(self.ctx, fstr ).build()
File "/scratch/work/huangj4/.conda_envs/graph-afm/lib/python3.8/site-packages/pyopencl/__init__.py", line 536, in build
self._prg, was_cached = self._build_and_catch_errors(
File "/scratch/work/huangj4/.conda_envs/graph-afm/lib/python3.8/site-packages/pyopencl/__init__.py", line 584, in _build_and_catch_errors
raise err
pyopencl._cl.RuntimeError: clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE
Build on <pyopencl.Device 'Tesla V100-PCIE-32GB' on 'NVIDIA CUDA' at 0x564d002e0e50>:
<kernel>:377:42: error: cannot assign to variable 'dpos0_' with const-qualified type 'const float4' (vector of 4 'float' values)
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~^
<kernel>:377:18: note: variable 'dpos0_' declared const here
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~~~~^~~~~~~~~~~~
<kernel>:404:42: error: cannot assign to variable 'dpos0_' with const-qualified type 'const float4' (vector of 4 'float' values)
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~^
<kernel>:404:18: note: variable 'dpos0_' declared const here
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~~~~^~~~~~~~~~~~
<kernel>:481:42: error: cannot assign to variable 'dpos0_' with const-qualified type 'const float4' (vector of 4 'float' values)
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~^
<kernel>:481:18: note: variable 'dpos0_' declared const here
const float4 dpos0_=dpos0; dpos0_.xyz= rotMatT( dpos0_.xyz , tipA.xyz, tipB.xyz, tipC.xyz );
~~~~~~~~~~~~~^~~~~~~~~~~~
(options: -I /scratch/work/huangj4/.conda_envs/graph-afm/lib/python3.8/site-packages/pyopencl/cl -I/scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/cl -I/scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/cl)
(source saved as /tmp/tmpmrxsylw4.cl)
However, these errors are unsolved. Are there some steps I did wrong? Thank you @NikoOinonen
This didn't use to be a problem with Nvidia devices, but I ran into this problem before with some other platforms and actually fixed it already in https://github.com/Probe-Particle/ppafm/commit/99c152328808989f7a1f6206159b0d28cb03c17a. But this commit is a later one than the one pointed to by the README. Fortunately there does not seem to be any changes between those commits that would affect this repo, so it should work if you replace the ProbeParticleModel version with
git clone https://github.com/ProkopHapala/ProbeParticleModel.git
cd ProbeParticleModel
git checkout 99c152328808989f7a1f6206159b0d28cb03c17a
@NikoOinonen
After running above commands and specifying the numpy version from numpy
to numpy=1.21.4
in the file environment.yml
, to recreate the conda environment graph-afm
, the problem was solved. 🎉
Without numpy version specification, this error comes:
PACKAGE_PATH = /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/pyProbeParticle
CPP_PATH = /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/cpp
OCLEnvironment platform[0] PACKAGE_PATH: /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/pyProbeParticle
i_platform 0
loadSpecies from : /scratch/work/huangj4/Github/Graph-AFM/ProbeParticleModel/pyProbeParticle/defaults/atomtypes.ini
Traceback (most recent call last):
File "generate_data.py", line 72, in <module>
afmulator = AFMulator(**afmulator_args)
File "/scratch/work/huangj4/Github/Graph-AFM/scripts/../ProbeParticleModel/pyProbeParticle/AFMulatorOCL_Simple.py", line 79, in __init__
self.typeParams = hl.loadSpecies('atomtypes.ini')
File "/scratch/work/huangj4/Github/Graph-AFM/scripts/../ProbeParticleModel/pyProbeParticle/HighLevelOCL.py", line 46, in loadSpecies
return PPU.loadSpeciesLines( str_Species.split('\n') )
File "/scratch/work/huangj4/Github/Graph-AFM/scripts/../ProbeParticleModel/pyProbeParticle/common.py", line 235, in loadSpeciesLines
return np.array( params, dtype=[('rmin',np.float64),('epsilon',np.float64),('alpha',np.float64),('atom',np.int),('symbol', '|S10')])
File "/scratch/work/huangj4/.conda_envs/graph-afm-test/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Hi, @NikoOinonen
I attempted to execute this code and followed through with the README instructions. Setting up the
graph-afm
conda environment and performing thebuild
step went smoothly without any issues. However, I encountered a problem when generating the simulated AFM images.Here is what I have done at the HPC:
The outputs say "No CUDA runtime is found", so I loaded the cuda module. But it seems no effect.
Could you please give me some advice on how to deal with this problem? Thank you.
Jie