Slow build and failing tests with `std_experimental_stats_mean`

nshaffer commented 4 years ago

Compiler: gfortran 9.2.0 OS: Arch Linux 32-bit Processor: Intel Atom (2 logical cores)

When I build stdlib on my (admittedly low-spec) machine, preprocessing and compilation of the submodule std_experimental_stats_mean is strikingly slow, especially the compilation step. I speculate that this is due to the preprocessor generating dozens of routines for different kinds, types, and ranks which then take a long time for the compiler to churn through. This raises two questions:

Do other users experience similar slowdowns?
If so, is it enough to raise concern about build times, considering that many stdlib functions will have similary heavy use of code generation to achieve genericity?

It is also worth noting that both tests related to mean fail for me. Below are the relevant (but not terribly helpful) sections from the test log. Each "test" consists of dozens of assertions, and I have not pinpointed which one borks it all -- gfortran's backtrace has not helped. Am I alone in this?

11/14 Testing: mean
11/14 Test: mean
Command: "/home/nrs/Documents/stdlib/build/src/tests/stats/test_mean" "/home/nrs/Documents/stdlib/build/src/tests/stats"
Directory: /home/nrs/Documents/stdlib/src/tests/stats
"mean" start time: Feb 02 00:02 MST
Output:
----------------------------------------------------------
ERROR STOP Assert failed.

Error termination. Backtrace:
#0  0x7a4d86 in ???
#1  0x4cc596 in ???
#2  0x4ba290 in ???
#3  0x4c7495 in ???
#4  0xb79c1f28 in ???
#5  0x4b9364 in ???
#6  0xffffffff in ???
<end of output>
Test time =   0.17 sec
----------------------------------------------------------
Test Failed.
"mean" end time: Feb 02 00:02 MST
"mean" time elapsed: 00:00:00
----------------------------------------------------------

12/14 Testing: mean_f03
12/14 Test: mean_f03
Command: "/home/nrs/Documents/stdlib/build/src/tests/stats/test_mean_f03" "/home/nrs/Documents/stdlib/build/src/tests/stats"
Directory: /home/nrs/Documents/stdlib/src/tests/stats
"mean_f03" start time: Feb 02 00:02 MST
Output:
----------------------------------------------------------
ERROR STOP Assert failed.

Error termination. Backtrace:
#0  0x735f05 in ???
#1  0x45d715 in ???
#2  0x44a943 in ???
#3  0x458614 in ???
#4  0xb7970f28 in ???
#5  0x44a364 in ???
#6  0xffffffff in ???
<end of output>
Test time =   0.16 sec
----------------------------------------------------------
Test Failed.
"mean_f03" end time: Feb 02 00:02 MST
"mean_f03" time elapsed: 00:00:00
----------------------------------------------------------

jvdp1 commented 4 years ago

@nshaffer The compilation time has been a concern for the Github CI (GFortran 8 and 9). Due to that we limited the number of ranks to 4 in the CMake files for the CI. The number of ranks can be limited with the CMake CMAKE_MAXIMUM_RANK.

The tests were OK on the Github Actions.

Also, using CMake, I have no issues on my Desktop (Fedora 31 64bit - GFortran 9.2.1 - Intel Core I7 4 cores):

build]$ time make -j
[  2%] Generating stdlib_experimental_stats_mean.f90
[  7%] Generating stdlib_experimental_stats.f90
[  7%] Generating stdlib_experimental_io.f90
Scanning dependencies of target fortran_stdlib
...
[ 97%] Linking Fortran executable test_mean_f03
[100%] Linking Fortran executable test_mean
[100%] Built target test_mean_f03
[100%] Built target test_mean

real    0m16,172s
user    0m20,246s
sys 0m2,074s
[build]$ ctest
Test project /home/jvandenp/stdlib/build
  ....
      Start 11: mean
11/13 Test #11: mean .............................   Passed    0.00 sec
      Start 12: mean_f03
12/13 Test #12: mean_f03 .........................   Passed    0.04 sec
      Start 13: Sleep
13/13 Test #13: Sleep ............................   Passed    0.35 sec

100% tests passed, 0 tests failed out of 13

Label Time Summary:
quadruple_precision    =   0.01 sec*proc (2 tests)

Total Test time (real) =   0.44 sec

The following tests did not run:
      1 - always_skip (Skipped)

I agree with you that achieving genericity by creating so much code for only one function is a problem. I am a bit afraid when other functions similar to mean will be added to stdlib_experimental_stats. Any ideas how to avoid that?

ivan-pi commented 4 years ago

I did a clean build on my desktop (Ubuntu 16.04 64 bit, gfortran 9.2.1, Intel Core i5 (4 cores):

(base) 13:35:59 ipribec@ipribec-ThinkPad: ~/TUM/stdlib/build$ time make -j
[  2%] Generating stdlib_experimental_io.f90
[  5%] Generating stdlib_experimental_stats_mean.f90
[  7%] Generating stdlib_experimental_stats.f90
Scanning dependencies of target fortran_stdlib
...
[ 97%] Linking Fortran executable test_mean_f03
[100%] Linking Fortran executable test_mean
[100%] Built target test_mean_f03
[100%] Built target test_mean

real    0m11.350s
user    0m13.542s
sys 0m0.736s

All tests passed.

Each "test" consists of dozens of assertions, and I have not pinpointed which one borks it all -- gfortran's backtrace has not helped. Am I alone in this?

A good reason to move forward with the assert subroutines/macros discussed in https://github.com/fortran-lang/stdlib/issues/121 and https://github.com/fortran-lang/stdlib/issues/72. Have you tried doing a binary search?

nshaffer commented 4 years ago

@jvdp1 @ivan-pi Ok, thanks for confirming that you don't get test failures. I'll keep hunting. Until we've formalized our unit testing practices, I think it's helpful to print out a message for each conceptually distinct test (one or more "asserts" that go together). That way, you get a bit more information in the log when trying to hunt down failures. Better yet is to have each conceptually distinct test be a separate program, which what CTest seems to expect, but I understand that's a little onerous.

@jvdp1 I don't have a good solution in mind for getting around source code explosion. C++ uses the concept of template instantiation. It's possible to emulate that with pre-processing (see https://github.com/SCM-NV/ftl for a cpp-based example), but that has to happen user-side. As long as we're unwilling to inflict pre-processing on users (which I agree with), this solution seems not to work for stdlib.

Using submodules helps somewhat. We can restrict modules to containing type definitions and the generic API and use separate submodules to implement the procedures for each generic name. This look like the approach you're taking with stats and stats_mean and friends, and I think it's a good one. It's mainly a dev-side benefit, but it's much better than nothing.

nshaffer commented 4 years ago

I revisited this today. All the failing asserts were double-precison test cases (real and complex alike). I was able to get all tests passing on my machine by increasing dptol to 10000*epsilon(1._dp). Smaller powers of ten out front led to failures. This was true of both the mean and the mean_f03 tests.

Since the tests just exercise the implementation, and nobody else reports a problem, I suspect this is a platform-specific issue. If someone has a 32-bit machine they can reproduce this on, that'd be the next thing to look into.

certik commented 4 years ago

@nshaffer thanks for investigating this. It looks like this is a problem on 32bits and we need to fix it. setting dtol to 1e4_dp*epsilon(1._dp) is an acceptable solution to me.

fortran-lang / stdlib

Slow build and failing tests with `std_experimental_stats_mean` #136