const-ae / sparseMatrixStats

Implementation of the matrixStats API for sparse matrices
Other
54 stars 3 forks source link

Cannot build sparseMatrixStats on GCC 4 through 8 on Redhat linux #18

Closed ms3389 closed 3 years ago

ms3389 commented 3 years ago

Hello

I've tried nearly every trick I could think of - I even went around the "C++14 standard requested but CXX14 is not defined" error by making a ~/.R/Makevars file and still no luck. It appears the package is broken in it's current form. Anyone had luck building it on linux recently?

Thanks

const-ae commented 3 years ago

No, the package is not broken (see build reports)!

I have just added a new section to the README that describes how to fix problems with the installation https://github.com/const-ae/sparseMatrixStats#installation-problems

Could you check if this fixes your problem and if not, provide the necessary details for me to debug the problem.

Best, Constantin

LTLA commented 3 years ago

I have only just realized that this is a C++14 package, based on MarioniLab/scran#87.

You should really, really, really, really, really re-consider whether C++14 features are really, really, really, really necessary. sparseMatrixStats is close to the bottom of the dependency tree for many of my Bioconductor packages and you are effectively forcing all of us to declare C++14 as an installation requirement. Institutional machines don't update that quickly and there are still systems using versions of GCC that struggle with C++11. (A lesson learnt via beachmat.)

Frankly, you don't really have all that much C++ code, so this seems like a major restriction for not much benefit.

const-ae commented 3 years ago

Hi Aaron,

I am sorry any the inconveniences caused. I am happy to re-write the package to use an earlier C++ standard. However, I would really like to keep the abstractions with the iterators over the rows/columns in place (e.g., here). If you have any suggestions how to do this, I would be very happy to hear. Using the generic lambdas from C++14 was the only solution that I found. However, I am by no means an expert C++ programmer, so any input would be appreciated :)

Best, Constantin

HenrikBengtsson commented 3 years ago

FWIW. Yes, C++14 is still a major hurdle for people on RedHat and CentOS (common in academia and on HPC environments), including CentOS 7, because they come with very dated toolchains, e.g. on an up-to-date CentOS 7 (end of life support until 2024), we have:

$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)

FYI, R is slowly moving toward the C++14 standard. It's currently assuming C++11 is available but not C++14. However, on 2021-01-25, there was an update to R-devel where it'll start expecting C++14 with a fallback to C++11, cf. https://github.com/wch/r-source/commit/815089107221c8e3f7a4bc4b0743ba556e6460fe.

How to install C++14 R packages on RedHat/CentOS

There are ways to get access to more modern toolchains via so-called CentOS Software Collections (SCL) that RedHat distributed. Many systems have these already installed, but on some, you need to ask the sysadm to install them. They're official Linux packages, so shouldn't be a biggy for sysadms to add. Many users don't know about them, and they're definitely contributing to the "this is just too much for me, so I'll give up" problem.

To install sparseMatrixStats using SCLs, you have to:

1. Once

Add the following to your ~/.R/Makevars file (create if missing):

CXX14 = g++
CXX14STD = -std=gnu++14
CXX14FLAGS = -g -O2
CXX14PICFLAGS = -fpic

Verify that you've got it right and R sees these settings by confirming you get:

$ R CMD config --all | grep CXX14
CXX14 = g++
CXX14STD = -std=gnu++14
CXX14FLAGS = -g -O2
CXX14PICFLAGS = -fpic
SHLIB_CXX14LD = g++ -std=gnu++14
SHLIB_CXX14LDFLAGS = -shared

2. Each time you install a C++14 R package

Then, make sure to enable SCL devtoolset-7 before trying to install the R package using:

gcc --version | head -1
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)

$ scl enable devtoolset-7 $SHELL

$ gcc --version | head -1
gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)

Now you can install sparseMatrixStats;

$ R --vanilla

R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
...

> BiocManager::install("sparseMatrixStats")

Bioconductor version 3.12 (BiocManager 1.30.10), R 4.0.4 (2021-02-15)
Installing package(s) 'sparseMatrixStats'
trying URL 'https://bioconductor.org/packages/3.12/bioc/src/contrib/sparseMatrixStats_1.2.1.tar.gz'
Content type 'application/x-gzip' length 698554 bytes (682 KB)
==================================================
downloaded 682 KB

* installing *source* package ‘sparseMatrixStats’ ...
** using staged installation
** libs
g++ -std=gnu++14 -I"/software/R-4.0.4/lib64/R/include" -DNDEBUG  -I'/home/alice/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include' -I/usr/local/include   -fpic  -g -O2 -c RcppExports.cpp -o RcppExports.o
g++ -std=gnu++14 -I"/software/R-4.0.4/lib64/R/include" -DNDEBUG  -I'/home/alice/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include' -I/usr/local/include   -fpic  -g -O2 -c SparseMatrixView.cpp -o SparseMatrixView.o
g++ -std=gnu++14 -I"/software/R-4.0.4/lib64/R/include" -DNDEBUG  -I'/home/alice/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include' -I/usr/local/include   -fpic  -g -O2 -c methods.cpp -o methods.o
g++ -std=gnu++14 -I"/software/R-4.0.4/lib64/R/include" -DNDEBUG  -I'/home/alice/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include' -I/usr/local/include   -fpic  -g -O2 -c row_methods.cpp -o row_methods.o
g++ -std=gnu++14 -shared -L/software/R-4.0.4/lib64/R/lib -L/usr/local/lib64 -o sparseMatrixStats.so RcppExports.o SparseMatrixView.o methods.o row_methods.o -L/software/R-4.0.4/lib64/R/lib -lR
installing to /home/alice/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-sparseMatrixStats/00new/sparseMatrixStats/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
*** copying figures
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (sparseMatrixStats)

The downloaded source packages are in
    ‘/scratch/alice/RtmpEydtnM/downloaded_packages’

> quit()

To "disable" the SCL, just do:

$ exit
$ gcc --version | head -1
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)

or simply start a new shell/terminal.

3. Load C++14 packages as usual

Note that you only need that SCL when installing the package, i.e. it's not needed when you use it. For example,

$ gcc --version | head -1
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)

$ R --quiet --vanilla

> library(sparseMatrixStats)
Loading required package: MatrixGenerics
Loading required package: matrixStats
...

> packageVersion("sparseMatrixStats")
[1] '1.2.1'

Hope this helps

LTLA commented 3 years ago

@const-ae Your use case seems like a pretty standard application of Functor classes. Roughly speaking:

class colProds {
public:
    colProds(bool n) : na_rm(n) { }
    template<class V, class R> // to handle whatever the structures are, I didn't bother looking at it. 
    double operator()(const V& values, const R& row_indices, int number_of_zeros) {
        if(! na_rm){
          bool any_na = is_any_na(values);
          if(any_na){
            return NA_REAL;
          }
        }
        R_len_t size = values.size();
        if(number_of_zeros > size){
          // Easy escape hatch
          return 0.0;
        }
        if(size + number_of_zeros == 0){
          return NA_REAL;
        }
        return quantile_sparse_impl(values, number_of_zeros, 0.5);
    }
private:
    bool na_rm;
};

such that:

NumericVector dgCMatrix_colProds(S4 matrix, bool na_rm){
    return reduce_matrix_double(matrix, na_rm, colProds(na_rm));
}

I believe this process is close to what happens under the hood with lambdas anyway.

FYI, R is slowly moving toward the C++14 standard. It's currently assuming C++11 is available but not C++14. However, on 2021-01-25, there was an update to R-devel where it'll start expecting C++14 with a fallback to C++11, cf. wch/r-source@8150891.

That sounds... pretty ambitious. I'm expecting that there's a build machine in the basement of one of the R core developers that's running some bizarre OS on some ancient architecture that doesn't have access to a C++14-complaint toolchain.

Many users don't know about them, and they're definitely contributing to the "this is just too much for me, so I'll give up" problem.

TBH that step 1 was already too much for me. Someone would need to be pretty determined to fiddle with Makevars, and then step 2 is too easy to forget. I mean, it's better than nothing when C++14 is the only option, but in this particular case, the C++14 usage is pretty gratuitous.

const-ae commented 3 years ago

I like the suggestions with the Functor classes. I went through all the functions and implemented it as you suggested. The code currently is still on a separate branch (https://github.com/const-ae/sparseMatrixStats/blob/remove_cpp14_features/src/methods.cpp#L203), but it looks okay and passes all unit tests.

So if there aren't any additional comments or issues I should think about from your side, I will merge the code in the coming days :)