dftlibs / xcfun

XCFun: A library of exchange-correlation functionals with arbitrary-order derivatives
https://dftlibs.org/xcfun/
Mozilla Public License 2.0
58 stars 32 forks source link

Array overflow in xcfun 2.1.1 #151

Closed susilehtola closed 3 years ago

susilehtola commented 3 years ago

I'm packaging xcfun for Fedora. It looks like there may be an array overflow in brx.cpp:

/home/susi/rpmbuild/BUILD/xcfun-2.1.1/src/functionals/brx.cpp: In function 'polarized<ctaylor<double, 0> >(ctaylor<double, 0> const&, ctaylor<double, 0> const&, ctaylor<double, 0> const&, ctaylor<double, 0> const&, ctaylor<double, 0> const&)ctaylor<double, 0> [clone .isra.0]':
/home/susi/rpmbuild/BUILD/xcfun-2.1.1/src/functionals/brx.cpp:57:8: warning: array subscript 1 is outside array bounds of 'struct taylor[1]' [-Warray-bounds]
   57 |   t[1] = 1;
      |   ~~~~~^~~
/home/susi/rpmbuild/BUILD/xcfun-2.1.1/src/functionals/brx.cpp:73:22: note: while referencing 'tmp'
   73 |   taylor<T, 1, Nvar> tmp;
      |                      ^~~
/home/susi/rpmbuild/BUILD/xcfun-2.1.1/src/functionals/brx.cpp:59:12: warning: array subscript 1 is outside array bounds of 'struct taylor[1]' [-Warray-bounds]
   59 |   t[1] = 1 / f[1];
      |          ~~^~~~
/home/susi/rpmbuild/BUILD/xcfun-2.1.1/src/functionals/brx.cpp:54:22: note: while referencing 'f'
   54 |   taylor<T, 1, Ndeg> f, d;
      |                      ^
/home/susi/rpmbuild/BUILD/xcfun-2.1.1/src/functionals/brx.cpp:59:8: warning: array subscript 1 is outside array bounds of 'struct taylor[1]' [-Warray-bounds]
   59 |   t[1] = 1 / f[1];
      |   ~~~~~^~~~~~~~
/home/susi/rpmbuild/BUILD/xcfun-2.1.1/src/functionals/brx.cpp:73:22: note: while referencing 'tmp'
   73 |   taylor<T, 1, Nvar> tmp;
      |                      ^~~
bast commented 3 years ago

Thanks for pointing this out. I will have a look.

bast commented 3 years ago

Which compiler version reports this? I don't see this with g++ (GCC) 10.2.0. Which does not mean that the problem is not there.

susilehtola commented 3 years ago

This is in Fedora 34

$ gcc --version
gcc (GCC) 11.1.1 20210428 (Red Hat 11.1.1-1)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
susilehtola commented 3 years ago

Full build logs for Fedora rawhide x86_64 https://kojipkgs.fedoraproject.org//work/tasks/4519/68624519/root.log https://kojipkgs.fedoraproject.org//work/tasks/4519/68624519/build.log

bast commented 3 years ago

Thanks! Can you recommend a Docker/Singularity image with a similar environment? But if not, I can also build myself one.

susilehtola commented 3 years ago

Fedora has Docker images https://hub.docker.com/_/fedora but I've never used them myself.

bast commented 3 years ago

I finally built myself a Fedora container and in this I cannot see this problem. But what I also don't understand is that I created some obvious out of bounds problems in the code just to see whether the compiler catches it and it didn't.

The log files above have probably already evaporated. Sorry that it took some time but I think we need to carefully look at all the settings so that I try to reproduce it as closely as possible.

This is my singularity image:

BootStrap: docker
From: fedora:latest

%post
    dnf install -y python cmake g++

%environment
    export LC_ALL=C
susilehtola commented 3 years ago

Use the optimization flags from rpm -E %optflags and you should be able to reproduce the issue.

bast commented 3 years ago

I got this:

$ rpm -E %optflags

-O2 -g
bast commented 3 years ago

I will try these flags out later today.

susilehtola commented 3 years ago

I got this:

$ rpm -E %optflags

-O2 -g

That is not correct. E.g. on Fedora 34 x86_64 the flags are

$ rpm -E %optflags
-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
bast commented 3 years ago

Thanks! But I still cannot reproduce this.

Here is a Singularity recipe (I had to remove the -specs=/usr/lib/rpm/redhat/* which refer to something that I don't have, maybe some package missing):

BootStrap: docker
From: fedora:latest

%post
    dnf install -y python cmake g++ git
    git clone --depth 1 --branch radovan/fedora-flags https://github.com/bast/xcfun.git
    cd xcfun
    ./setup
    cd build
    env VERBOSE=1 make

%environment
    export LC_ALL=C

Built with:

$ sudo singularity build fedora.sif fedora.def

Using your flags: https://github.com/bast/xcfun/commit/4319c81ed47f64af07956d5162df347ce8a397f1

This happily builds. This Singularity image is not meant to be useful as an image, only to build in a well defined environment.

susilehtola commented 3 years ago

Yes, this error doesn't prevent from building; the compiler is just suggesting that there's an array overflow.

susilehtola commented 3 years ago

I think the problem is in

  taylor<T, 1, Ndeg> f, d;
  t = 0;
  t[0] = BR(z0);
  t[1] = 1;
  f = BR_z(t);
  t[1] = 1 / f[1];

that if Ndeg=1 you get nonsense; you need to also check that Ndeg>=1.

susilehtola commented 3 years ago

Sure enough, adding in that check solves the overflow issue.

robertodr commented 3 years ago

I cannot reproduce either. How exactly are you configuring and building the code? What concerns me is that the Ndeg non-type template parameter is supposed to be equal to XCFUN_MAX_ORDER and that must be at least 3 (CMake will stop otherwise) and is by default set to 6. I can see from the generated assembly and the tree dumped from G++ that lower order templates are generated (and yes, they might overflow) but as far as I can understand, they are won't be used in the actual calculation.

susilehtola commented 3 years ago

Logs attached root.log build.log

susilehtola commented 3 years ago

Also #152 is still an issue; the builds fail on 32-bit architectures.

bast commented 3 years ago

Thanks for the logs. I will try to see the same in the Fedora image.

bast commented 3 years ago

Finally looking into it and I can reproduce it using your build recipe. But I have the feeling that using your build recipe some variables remain undefined and hence we see the problem. To be confirmed. More info very soon.

bast commented 3 years ago

Finally looking into it and I can reproduce it using your build recipe. But I have the feeling that using your build recipe some variables remain undefined and hence we see the problem. To be confirmed. More info very soon.

Just documenting here that my suspicion was actually wrong but the difference between a build that shows this problem and the one that doesn't seems to be -DNDEBUG.

robertodr commented 3 years ago

I've pushed my take on fixing this one: #154