cvxgrp / scs

Splitting Conic Solver
MIT License
553 stars 136 forks source link

Issues with Running C++ API through R and Python #221

Closed rbassett3 closed 2 years ago

rbassett3 commented 2 years ago

Specifications

OS: Fedora SCS Version: Cloned from github 3 weeks ago Compiler gcc 9.3.0

Description

I have formulated a specific problem in C++ and compiled as a shared lib. When I call the shared lib using a main function in C++, everything works as expected. When I wrap this shared lib in either pybind11, cython, or as an R package chaos ensues. Everything works correctly until I call an scs function. The problem fails to validate in scs and then segfaults.

I suspect this is an issue on my side, because I'm overlooking something about how scs manages memory. But for the life of me I cannot figure it out. It seems like scs is not being passed memory correctly (both the cone K and the vector b.)

Here's the out put from gdb.

Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
Starting program: /home/robert/.conda/envs/py/bin/python test_script.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Detaching after fork from child process 8495]

cone dimensions -1093076514 not equal to num rows in A = m = 78
cone validation error
ERROR: Validation returned failure

Program received signal SIGSEGV, Segmentation fault.
scs_update (w=0x0, b=0x555555cd9da0, c=0x55555591b6a0) at src/scs.c:760
760     memcpy(w->b_orig, b, w->d->m * sizeof(scs_float));

How to reproduce

My program is structured as follows.

  1. scs is called from within a Solver class. All scs variables are public variables within Solver.
  2. In the Solver declaration, the ScsCone, the ScsData, ScsInfo, and ScsInfo are allocated memory with new.
  3. In the function Solver::DeclareProblem, the A matrix, the b vector, and the c vector are all allocated memory with new and defined. I can print the proper values of A, b, c, and the dimensions of K from within this function. So these variables appear to be allocated properly within my library, but once scs tries to access them all hell breaks loose.
  4. The cone validation error produces nonsense when scs_init is called near the end of Solver::DeclareProblem. Right after that, I call scs_update which produces the segfault.

I have the proper includes and am linking to the scs library (as evidenced by the fact that this code works correctly when called from C++).

I am stuck. Do you have any suggestions for ways to debug this? Does calling problems written in scs through R or Python require some additional syntax beyond what it takes for C++?

bodono commented 2 years ago

This sounds like a types problem. It looks to me like you have compiled the library with scs_int referring to 32 bit ints and the other wrappers are passing in 64 bit integers (or it could be the scs_float type but that is less likely). You can try compiling the shared library again with scs_int defined to be a 64 bit integer using DLONG=1.

rbassett3 commented 2 years ago

Thank you for that recommendation. When I do so I get another segfault even earlier in the call to scs_init. Here's the gdb output.

Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
Starting program: /home/robert/.conda/envs/py/bin/python test_script.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Detaching after fork from child process 68690]

Program received signal SIGSEGV, Segmentation fault.
0x00007fffe9019814 in _scs_validate_lin_sys (A=0x0, P=0x555555cd9da0) at linsys/scs_matrix.c:42
42    if (!A->x || !A->i || !A->p) {

I'll keep looking and post the solution here when I find it.

bodono commented 2 years ago

Hmm, that's strange. Can you make sure you deleted all the compilation artifacts and remade the SCS library from scratch with all compilation steps using DLONG=1? E.g., if using make

make purge
make DLONG=1

If using that still breaks it could be that the chosen 64 bit integer is not right. You can see that here we expect the 64 bit integer to be long long, but that might not be right for your system (eg, it could be int64_t). It could also be that the scs_float type is a different width, but this is unlikely since I assume you are using double.

rbassett3 commented 2 years ago

@bodono. Thank you for your help. It turned out to not be the type inconsistency but your familiarity with the codebase helped me backtrack to the actual issue. I'm leaving it here in case it benefits someone else.

I ran gdb (it turns out you can run gdb to debug Python code linked to C++) and worked step by step through the cone validation code in scs.c, expecting a type inconsistency to emerge. Instead, it turned out that I forgot to initialize ed, the number of dual exponential cone constraints, to zero. I have primal exponential cones in my problems so as I moved through the initialization I checked off the primal exponential constraints and then moved on. For some reason the C++ compiler correctly assumed that the uninitialized variable should be zero, whereas running from R or Python took the number of exponential cones to be something arbitrary. This goofed up the cone validation, and meant the return value of scs_init was the null pointer, though it was supposed to be the initialized workspace. The segmentation fault happens when trying to access memory in the workspace, which fails simply because the workspace was not initialized.