COPT-Public / cuPDLP-C

Code for solving LP on GPU using first-order methods
MIT License
148 stars 34 forks source link

Solver diverges when using CUDA #22

Open fuglede opened 6 months ago

fuglede commented 6 months ago

I have a large-ish LP which plc can solve just fine when compiled with -DBUILD_CUDA=OFF, but with -DBUILD_CUDA=ON I'm getting output a la

Iteration  498
PrimalStep: 1.161839e-12, SumPrimalStep: 2.100882e+02, DualStep: 3.706379e+13, SumDualStep: 2.100882e+02
Stepsize: 6.562176e+00, Primal weight: 5.648094e+12 Ratio: 1.000000e+00
 -- stepsize iteration 1: 6.715268 2550327659779212181504.000000
 -- PrimalStep DualStep: 0.000000 37063786604937.484375
 -- FirstTerm SecondTerm: 2160791130818487255040.000000 6.715268
 -- nStepSizeIter: 524
 -- RED_EXP GRO_EXP: 0.300000 0.600000
     -- iteraction(x) interaction(y): 180960874762558169415680.000000 -213243374940688198458600885579677696.000000
     -- movement (scaled norm)  : 461509524244794076633763648089789999137947648.000000
     -- movement (scaled norm)  : 461509524244794076633763648089789999137947648.000000

It looks like the step sizes are blowing up(?) -- can I somehow control those? Setting -eLineSearchMethod 0 seems like it would make it easier to control the steps, but that gives a number of errors:

CUBLAS API failed at line 564 of /home/user/repos/cuPDLP-C/cupdlp/cupdlp_linalg.c with error: the function failed to launch on the GPU (13)
CUBLAS API failed at line 578 of /home/user/repos/cuPDLP-C/cupdlp/cupdlp_linalg.c with error: the function failed to launch on the GPU (13)
CUBLAS API failed at line 550 of /home/user/repos/cuPDLP-C/cupdlp/cupdlp_linalg.c with error: the requested functionality is not supported (15)
CUBLAS API failed at line 535 of /home/user/repos/cuPDLP-C/cupdlp/cupdlp_linalg.c with error: the function failed to launch on the GPU (13)
CUBLAS API failed at line 550 of /home/user/repos/cuPDLP-C/cupdlp/cupdlp_linalg.c with error: the requested functionality is not supported (15)
SkyLiu0 commented 6 months ago

Thank you for your feedback! Could you please provide us with scripts and the LP instance in .mps format for us to reproduce the issue? You can email us at chuwzhang@gmail.com and tianhao.sky.liu@gmail.com.

hannes-uppman commented 6 months ago

I also ran into this problem, and did a little bit of debugging to figure out why the CPU and the GPU versions work differently. While I have not fully figured it out, I did find something:

If I, in Ax_single_gpu in the file cupdlp_linalg.c, change from cuda_csr_Ax to cuda_csc_Ax (in case CSR_CSC) then the GPU version gives the same result as the CPU version. My first though was that this might be caused by some error in the creation of the csr version of A (the CPU version does not use this matrix from what I can tell). However, I dumped the matrix (in both csc and csr formats) and the input data, and re-created the matrices and the matrix vector products with the help of scipy. From what I can see, the csc and csr data represents the same matrix, and when using the csc data, the multiplication (using cusparse on the GPU) is correct, and when using the csr data it is wrong. So, it appears something goes wrong in the call to cusparseSpMV when using the csr data.

However, I did these tests in a rush, and might have messed up, don't trust my findings.

I'm using CUDA 12.4.1 on Windows 10.

hannes-uppman commented 6 months ago

After changing to CUDA 12.3.2, the problem with cusparseSpMV is gone, and both the GPU and CPU versions work as expected. Yay!

fuglede commented 6 months ago

Changing cuda_csr_Ax to cuda_csc_Ax did not suffice here.

hannes-uppman commented 6 months ago

Bummer! It's probably a different problem then.

I forgot to mention one thing before: When changing cuda_csr_Ax to cuda_csc_Ax, I also tweaked cuda_alloc_MVbuffer (in the file cupdlp_cudalinalg.cu) to correct AxBufferSize for the csc case (this is used to size the buffer dBuffer that is used in the call to cusparseSpMV). I have no idea if this affects things.

fuglede commented 6 months ago

Downgraded CUDA from 12.4 to 12.3 (and gcc/g++ from 13 to 12) and get a different error instead, e.g. running on the example mps:

--------------------------------------------------
reading file...
    ./example/afiro.mps
--------------------------------------------------
Running HiGHS 1.6.0: Copyright (c) 2023 HiGHS under MIT licence terms
Minimize
No obj offset
 ** On entry to cusparseCreate(): CUDA context cannot be initialized

CUSPARSE API failed at line 186 of /home/user/repos/cuPDLP-C/interface/mps_highs.c with error: initialization error (1)
henszey commented 6 months ago

I also had the case of my problem blowing up the step size I ended up downgrading to HiGHS v1.6.0 and cuda-toolkit-12-3.

Here are my notes/steps on clean Ubuntu 22.04:

sudo lshw -c video
sudo apt update
sudo apt upgrade
sudo reboot
sudo apt install build-essential cmake htop btop

####Might work using the 550 driver, but I am pretty sure I used the 535 which is actuall 12-2 compat
####sudo apt install nvidia-headless-550-server nvidia-utils-550-server nvidia-fabricmanager-550 cuda-drivers-fabricmanager-550

sudo apt install nvidia-headless-535-server nvidia-utils-535-server nvidia-fabricmanager-535 cuda-drivers-fabricmanager-535
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-3
sudo reboot
nvidia-smi

git clone https://github.com/ERGO-Code/HiGHS.git
git tag
git checkout v1.6.0
cd HiGHS
mkdir build
cd build
cmake -DFAST_BUILD=ON ..
cmake --build .
ctest
sudo cmake --install .

git clone https://github.com/COPT-Public/cuPDLP-C.git
cd cuPDLP-C/
vim CMakeLists.txt #Downgrade to installed cmake 3.22.1
#https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
vim FindCUDAConf.cmake #older cmake can't do "all" so set(CUDA_ARCHITECTURES 90)
export HIGHS_HOME=/usr/local
export CUDA_HOME=/usr/local/cuda-12
mkdir build
cd build

cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_CUDA=ON \
-DCMAKE_C_FLAGS_RELEASE="-O2 -DNDEBUG" \
-DCMAKE_CXX_FLAGS_RELEASE="-O2 -DNDEBUG" \
-DCMAKE_CUDA_FLAGS_RELEASE="-O2 -DNDEBUG" ..
cmake --build . --target plc

./bin/plc -fname ../../large_lp.mps
fuglede commented 6 months ago

I almost don't want to admit this, but it seems like rebooting did the trick, after downgrading to 12.3.

yukini2009 commented 6 months ago

Hi , I am from NVIDIA and someone reported this case 4601974 to us . I am looking into a reproducing in house on CUDA 12.4 . But will need any of the LP instances in .mps file that can reproduce the issue . If it is not confidential , can anyone share the .mps file to alias NVSDKIssues@nvidia.com ? On the other side , CUSPARSE_LOG_LEVEL=5 can get useful logs if none of the .mps can be shared . Thanks .

hannes-uppman commented 6 months ago

It looks like the instance 30_70_45_095_100 (https://miplib2010.zib.de/download/30_70_45_095_100.mps.gz) from MIPLIB 2010 reproduces this problem. It's the only MIPLIB 2010 instance I've tried, so it's probably not the only instance in there that triggers the problem.

yukini2009 commented 6 months ago

Thanks ! Also we got another user volunteered to provide us a .mps case . We can reproduce the issue in house and will get back here conclusion after investigation .

hannes-uppman commented 2 months ago

I think this issue is solved by https://github.com/COPT-Public/cuPDLP-C/pull/32