Closed orlitzky closed 4 years ago
Not sure I understand this (unless it is a side effect of tne general Fortran/C interfacing problem from #2154), in particular why would it flag jobvt
but not the identically declared jobu
here ? FWIW valgrind does not complain. Which version of gfortran and what other build options do you use ? (One possibility could be that your FFLAGS override Makefile defaults, e.g. dropping the -fno-optimize-sibling-calls used to work around the interface issue)
FFLAGS were overridden as intended by adding required flags in Makefile.system, but the make.inc produced for the lapack-netlib (Reference-LAPACK) subtree got these without the buildtime override. So LAPACK got built with your "-fbounds-check" alone (dropping both the mandatory -frecursive and the -fno-optimize-sibling-calls). With that fixed however I now get a similar - and similarly spurious - bounds check failure for a zpotrs call in test 33 (after similar calls to potrf in the same file completed without issues), which I can only get around by adding the hidden argument for the size of the character argument. Clearly this requires changing the signature of zpotrs in common_interface.h , which would break the established API. (Which is why #2154 is still open, as Reference-LAPACK has not adopted a solution so far and GCC has backtracked on requiring hidden length arguments for character(1))
Did I mess up my testing somehow? If I understand you correctly, you're saying there are two different problems:
FFLAGS
to the build, the important -frecursive
and -fno-optimize-sibling-calls
flags are accidentally clobbered in some places.While testing -- to ensure that it was really the bounds-checking flag that was causing the problem -- I ran the same exact command with and without the bounds-checking flags. This command (inspired by the user's report) failed before https://github.com/gentoo/gentoo/commit/57e2be19d17640729372235c4c73171e069d2cca:
root # FFLAGS="-O2 -pipe -march=native -fstack-protector-strong -fstack-check -fbounds-check" FCFLAGS="-O2 -pipe -march=native -fstack-protector-strong -fstack-check -fbounds-check" USE=test FEATURES=test emerge -v1 =sci-libs/openblas-0.3.9
After that commit, which only filters out the -fbounds-check
flag, the same command works. If passing any FFLAGS
clobbered the -frecursive
and -fno-optimize-sibling-calls
flags, wouldn't they still get clobbered in my example above? After filtering the bounds-checking flag, we still wind up with FFLAGS="-O2 -pipe -march=native -fstack-protector-strong -fstack-check"
.
Yes. For (1) the problem lies with the way the Reference-LAPACK source is docked into the OpenBLAS build system - we still utilize their approach of creating a definitions file (make.inc) with compiler settings and its FFLAGS get overriden by yours. The effects of missing -frecursive
and -fno-optimize-sibling-calls
will not be immediately obvious, but both can lead to wrong results or strange crashes.
For item (2) I am now playing with a new #ifdef HIDDENARGS that adds a set of alternate definitions for the LAPACK functions where the additional length arguments are appended. Defining this in the affected tests should solve the
-fbounds-check` problem while user code
would still be able to use the conventional interface (that only gfortran 8 & 9 took issue with)
Thanks for clarifying. What's bothering me is, why does https://github.com/gentoo/gentoo/commit/57e2be19d17640729372235c4c73171e069d2cca appear to fix the problem? All I've done is remove the -fbounds-check
flag from the user's FFLAGS
. That still leaves all the other FFLAGS
, so I would expect -frecursive
and -fno-optimize-sibling-calls
to still be overridden. I can't say for sure that that's not the case (I'm not 100% sure which lines to look at in the build log), but the build failures do disappear after stripping only -fbounds-check
from FFLAGS="-O2 -pipe -march=native -fstack-protector-strong -fstack-check -fbounds-check"
.
It fixes the problem as the only visible problem leading to your build failure was in the bounds check. You would notice the missing -frecursive only in some specific LAPACK functions when called in user code, and the missing -fno-optimize-sibling-calls could "only" lead to rare and hard to reproduce crashes (stack corruptions) in user code when compiling with gfortran version 8.x or 9.x The fix for the unwanted override would be to change the "FFLAGS=" to "override FFLAGS=" in the "lapack_prebuild" section of the toplevel Makefile in OpenBLAS, or to add the two gfortran options to your own FFLAGS. (Proper fix for the bounds-check fault - instead of just stripping the option - is harder than I hoped as I cannot find a solution that does not affect most of the upstream LAPACKE code)
Thanks again. I took your advice and added "override" to that line. I also forced a rebuild of OpenBLAS for all of our users, in case any had compiled it with FFLAGS
set in the past.
Added an entry to the FAQ in the wiki
Note that this seems to have been done in https://github.com/xianyi/OpenBLAS/commit/2d33e12a119f0cf97e5c41ff4f6499e9229d9bd5 for https://github.com/xianyi/OpenBLAS/pull/3390.
Thanks!
The
-fbounds-check
or-fcheck=bounds
flags to gfortranWhen I build OpenBLAS with
FFLAGS="-fbounds-check" FCFLAGS="-fbounds-check"
, I get a reproducible test failure (on amd64):I'm not sure if it's a problem with the test suite or the file itself. A user reported this to us at https://bugs.gentoo.org/726474