Open fuglede opened 6 months ago
Thank you for your feedback! Could you please provide us with scripts and the LP instance in .mps
format for us to reproduce the issue? You can email us at chuwzhang@gmail.com and tianhao.sky.liu@gmail.com.
I also ran into this problem, and did a little bit of debugging to figure out why the CPU and the GPU versions work differently. While I have not fully figured it out, I did find something:
If I, in Ax_single_gpu
in the file cupdlp_linalg.c
, change from cuda_csr_Ax
to cuda_csc_Ax
(in case CSR_CSC
) then the GPU version gives the same result as the CPU version. My first though was that this might be caused by some error in the creation of the csr version of A (the CPU version does not use this matrix from what I can tell). However, I dumped the matrix (in both csc and csr formats) and the input data, and re-created the matrices and the matrix vector products with the help of scipy. From what I can see, the csc and csr data represents the same matrix, and when using the csc data, the multiplication (using cusparse on the GPU) is correct, and when using the csr data it is wrong. So, it appears something goes wrong in the call to cusparseSpMV
when using the csr data.
However, I did these tests in a rush, and might have messed up, don't trust my findings.
I'm using CUDA 12.4.1 on Windows 10.
After changing to CUDA 12.3.2, the problem with cusparseSpMV
is gone, and both the GPU and CPU versions work as expected. Yay!
Changing cuda_csr_Ax
to cuda_csc_Ax
did not suffice here.
Bummer! It's probably a different problem then.
I forgot to mention one thing before: When changing cuda_csr_Ax
to cuda_csc_Ax
, I also tweaked cuda_alloc_MVbuffer
(in the file cupdlp_cudalinalg.cu
) to correct AxBufferSize
for the csc case (this is used to size the buffer dBuffer
that is used in the call to cusparseSpMV
). I have no idea if this affects things.
Downgraded CUDA from 12.4 to 12.3 (and gcc/g++ from 13 to 12) and get a different error instead, e.g. running on the example mps:
--------------------------------------------------
reading file...
./example/afiro.mps
--------------------------------------------------
Running HiGHS 1.6.0: Copyright (c) 2023 HiGHS under MIT licence terms
Minimize
No obj offset
** On entry to cusparseCreate(): CUDA context cannot be initialized
CUSPARSE API failed at line 186 of /home/user/repos/cuPDLP-C/interface/mps_highs.c with error: initialization error (1)
I also had the case of my problem blowing up the step size I ended up downgrading to HiGHS v1.6.0 and cuda-toolkit-12-3.
Here are my notes/steps on clean Ubuntu 22.04:
sudo lshw -c video
sudo apt update
sudo apt upgrade
sudo reboot
sudo apt install build-essential cmake htop btop
####Might work using the 550 driver, but I am pretty sure I used the 535 which is actuall 12-2 compat
####sudo apt install nvidia-headless-550-server nvidia-utils-550-server nvidia-fabricmanager-550 cuda-drivers-fabricmanager-550
sudo apt install nvidia-headless-535-server nvidia-utils-535-server nvidia-fabricmanager-535 cuda-drivers-fabricmanager-535
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-3
sudo reboot
nvidia-smi
git clone https://github.com/ERGO-Code/HiGHS.git
git tag
git checkout v1.6.0
cd HiGHS
mkdir build
cd build
cmake -DFAST_BUILD=ON ..
cmake --build .
ctest
sudo cmake --install .
git clone https://github.com/COPT-Public/cuPDLP-C.git
cd cuPDLP-C/
vim CMakeLists.txt #Downgrade to installed cmake 3.22.1
#https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
vim FindCUDAConf.cmake #older cmake can't do "all" so set(CUDA_ARCHITECTURES 90)
export HIGHS_HOME=/usr/local
export CUDA_HOME=/usr/local/cuda-12
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_CUDA=ON \
-DCMAKE_C_FLAGS_RELEASE="-O2 -DNDEBUG" \
-DCMAKE_CXX_FLAGS_RELEASE="-O2 -DNDEBUG" \
-DCMAKE_CUDA_FLAGS_RELEASE="-O2 -DNDEBUG" ..
cmake --build . --target plc
./bin/plc -fname ../../large_lp.mps
I almost don't want to admit this, but it seems like rebooting did the trick, after downgrading to 12.3.
Hi , I am from NVIDIA and someone reported this case 4601974 to us . I am looking into a reproducing in house on CUDA 12.4 . But will need any of the LP instances in .mps file that can reproduce the issue . If it is not confidential , can anyone share the .mps file to alias NVSDKIssues@nvidia.com ? On the other side , CUSPARSE_LOG_LEVEL=5 can get useful logs if none of the .mps can be shared . Thanks .
It looks like the instance 30_70_45_095_100
(https://miplib2010.zib.de/download/30_70_45_095_100.mps.gz) from MIPLIB 2010 reproduces this problem. It's the only MIPLIB 2010 instance I've tried, so it's probably not the only instance in there that triggers the problem.
Thanks ! Also we got another user volunteered to provide us a .mps case . We can reproduce the issue in house and will get back here conclusion after investigation .
I think this issue is solved by https://github.com/COPT-Public/cuPDLP-C/pull/32
I have a large-ish LP which
plc
can solve just fine when compiled with-DBUILD_CUDA=OFF
, but with-DBUILD_CUDA=ON
I'm getting output a laIt looks like the step sizes are blowing up(?) -- can I somehow control those? Setting
-eLineSearchMethod 0
seems like it would make it easier to control the steps, but that gives a number of errors: