RcppCore / Rcpp

Seamless R and C++ Integration
https://www.rcpp.org
GNU General Public License v2.0
742 stars 212 forks source link

Rcpp directly includes the C library header <inttypes.h> instead of C++ <cinttypes>, resulting in macros not working when built with older c standard library headers #1210

Closed pettyalex closed 2 years ago

pettyalex commented 2 years ago

I will work to provide a minimal reproduction example if it helps, but I am encountering a problem building https://github.com/saigegit/SAIGE that seems like a bug in Rcpp. This is manifesting as a build failure where the macros defined in <inttypes.h> are not defined: https://github.com/statgen/savvy/issues/23 .

I'll make an accompanying PR to Rcpp with a fix for this.

In the C99 spec section 7.8.1, footnote 191 says that the macros defined in should only be defined when __STDC_FORMAT_MACROS is defined:

C++ implementations should define these macros only when _ _STDC_FORMAT_MACROS is defined
before <inttypes.h> is included.

<cinttypes>, which is the C++ version of this header has specific code to always set that macro before #including : https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-api-4.6/a00802_source.html

00040 // For 8.11.1/1 (see C99, Note 184)
00041 #if _GLIBCXX_HAVE_INTTYPES_H
00042 # ifndef __STDC_FORMAT_MACROS
00043 #  define _UNDEF__STDC_FORMAT_MACROS
00044 #  define __STDC_FORMAT_MACROS
00045 # endif
00046 # include <inttypes.h>
00047 # ifdef _UNDEF__STDC_FORMAT_MACROS
00048 #  undef __STDC_FORMAT_MACROS
00049 #  undef _UNDEF__STDC_FORMAT_MACROS
00050 # endif
00051 #endif

Because Rcpp includes <inttypes.h> directly without defining __STDC_FORMAT_MACROS, it will prevent any later includes that include <cinttypes> or <inttypes.h> from having the macros defined. This means that in the application, if anything that #includes Rcpp is included before another header that needs <cinttypes>, then the macros will not be configured, causing the build failure that I saw. I'm able to work around this by changing the order of includes, or by manually including <cinttypes> before including anything that depends on Rcpp, but I am fairly certain that Rcpp is a C++ -only library not designed to be consumed from C. If so, then it can use the C++ <cinttypes> instead of C <inttypes.h>.

This failure will happen in any environment with an older version of <inttypes.h>. The C11 standard removed this language, and it looks like a change in 2013 removed this behavior from <inttypes.h>: https://sourceware.org/git/?p=glibc.git;a=commit;h=1ef74943ce2f114c78b215af57c2ccc72ccdb0b7

However, the current latest version of the anaconda package manager is shipping an <inttypes.h> old enough to need __STDC_FORMAT_MACROS, with the end result that tools using Rcpp and <cinttypes> will not work correctly within a conda environment. Conda should probably not be shipping C library headers this old, but Conda is an enormous complicated project and I'm not sure where to make changes for them.

As a C++ library, I think that using the C++ stdlib header <cinttypes> is the right choice for Rcpp.

eddelbuettel commented 2 years ago

Agreed in principle but note that Rcpp is about fifteen years old. It is more often a matter of reordering includes and being careful (and we have done so via STRICT_R_HEADERS on my default, the R_NO_REMAP provision to not shadow length() and error() etc).

That said not sure how much a PR swapping C headers for C++ headers really helps. I have used C++ for decades and the C headers are generally just fine (yet I agree that stylistically speaking C++ headers are preferable).

Lastly, note that R only offers a C interface, and that we must include R headers so some C headers will always be with us anyway.

pettyalex commented 2 years ago

I appreciate the response.

It is certainly possible to work around this through reordering includes, as in SAIGE I was able to work around it by making #include <RcppArmadillo.h> the last include.

I made this suggestion not for stylistic reasons, but because it could make Rcpp easier to consume and prevent more users from encountering similar failures. In this specific case, the C++ header <cinttypes> will always set the formatting macros, while the <inttypes.h> may not unless __STDC_FORMAT_MACROS is set. The two headers have different behavior for this, both by specification and implementation, and I believe that C++ consumers will always prefer the behavior of <cinttypes>

eddelbuettel commented 2 years ago

Yes, making Rcpp* headers the last one is / was often needed due to spillage from R -- some things outside of our control.

Enchufa2 commented 2 years ago

We're probably ready for this change, because R already dropped support for C++98, but note that this was the reason for this header: cinttypes is C++11.

eddelbuettel commented 2 years ago

We talked about it but I don't think we excluded C++98 yet. We still have twice the headers for some impl details.

pettyalex commented 2 years ago

Apologies for my impatience, I'm used to throwing up PRs in progress and letting automated workflows check them.

I'm taking the PR chat to the PR, to keep this focused on the underlying issue.

eddelbuettel commented 2 years ago

Apologies for my impatience, I'm used to throwing up PRs in progress and letting automated workflows check them.

Not a bad workflow in general. We can maybe extend your PR by replacing a few more C headers. I am just not yet convinced it is the root of your issue.

eddelbuettel commented 2 years ago

We talked about it but I don't think we excluded C++98 yet. We still have twice the headers for some impl details.

Empirically confirmed in Docker by setting CXX and friends explicitly in ~/.R/Makevars:

root@25f7e9594baa:/work# R CMD INSTALL Rcpp_1.0.8.3.tar.gz
* installing to library ‘/usr/local/lib/R/site-library’
* installing *source* package ‘Rcpp’ ...
** using staged installation
** libs
g++ -std=c++98 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-lt88el/r-base-4.1.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c api.cpp -o api.o
g++ -std=c++98 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-lt88el/r-base-4.1.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c attributes.cpp -o attributes.o
g++ -std=c++98 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-lt88el/r-base-4.1.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c barrier.cpp -o barrier.o
g++ -std=c++98 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-lt88el/r-base-4.1.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c date.cpp -o date.o
g++ -std=c++98 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-lt88el/r-base-4.1.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c module.cpp -o module.o
g++ -std=c++98 -I"/usr/share/R/include" -DNDEBUG -I../inst/include/     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-lt88el/r-base-4.1.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c rcpp_init.cpp -o rcpp_init.o
g++ -std=c++98 -shared -L/usr/lib/R/lib -Wl,-z,relro -o Rcpp.so api.o attributes.o barrier.o date.o module.o rcpp_init.o -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/00LOCK-Rcpp/00new/Rcpp/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (Rcpp)
root@25f7e9594baa:/work# 
pettyalex commented 2 years ago

Not a bad workflow in general. We can maybe extend your PR by replacing a few more C headers. I am just not yet convinced it is the root of your issue.

I was able to work around this in SAIGE by reordering includes as you suggested, and it builds & runs successfully within the problematic conda environment. I dug pretty deep in this, tracing include order with g++ -H and comparing differences between stdlib headers shipped with conda vs Ubuntu. The conda-forge sysroot package is shipping glibc headers from CentOS 7 old enough to cause this issue.

eddelbuettel commented 2 years ago

Ah now you mention Conda. That is no longer a plain Ubuntu system!! You are (more or less) on your own; we have seen issues there and please note that Conda, while it works fine for many, is not an officially supported R or CRAN platform.

pettyalex commented 2 years ago

Yes, I just noticed upon investigation that the glibc headers when installing Python or R from conda are glibc 2.12 from Centos 6, but it's linking against the systems (much newer) glibc. That explains how <inttypes.h> is old enough to cause this problem in the first place.

I'm not a big fan of conda just because of how complex it is and how it will pin to ancient versions of things (as we're seeing here), but I'm in an environment where most of my users and our computing environment would most prefer conda solutions.

eddelbuettel commented 2 years ago

Yes, I know how it goes, and I know the demands one case face there but you're more or less on your own (and Rcpp exists in conda channels). My limited experience is that mixing leads to tears.

pettyalex commented 2 years ago

To summarize my understanding of this, the issue as I encountered it occurs when:

That 3rd bullet point is exactly why this isn't a problem on any supported platforms. I'm confident that this would reproduce on CentOS 6, but Centos6 has been out of support since 2020.

This configuration is also exactly what you get when installing from conda the r-base package from the conda-forge channel, which ships gcc 10.3.0 and glibc headers for glibc 2.12!

eddelbuettel commented 2 years ago

Good summary but I still think we can close this as Rcpp builds fine on any reasonable system.

Replacing some / more / all C headers with C++ headers is something we can try in another branch if someone can volunteer some reverse dependency testing too.