TriBITSPub / TriBITS

TriBITS: Tribal Build, Integrate, and Test System,
http://tribits.org
Other
36 stars 46 forks source link

TRIBITS_CTEST_DRIVER(): Avoid all later actions if configure failures in prior ctest -S script invocation #316

Open bartlettroscoe opened 4 years ago

bartlettroscoe commented 4 years ago

A problem that we are having with the ATDM Trilinos builds is that when the configure failures in the first ctest -S invocation that does the configure and build the second ctest -S invocation that runs the tests does not know that the configure failed (and the build never happened) and still runs the tests. The problem is that the test may still be sitting there from a prior day that actually build the tests.

We can see the problem in the ATDM Trilinos builds like show here showing:

Site Build Name Revision Error Warn Not Run Fail Pass Start Time Labels
vortex Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt c1bfa4 1 0 28 9 2256 Apr 29, 2020 - 03:04 MDT (28 labels)
vortex Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg c1bfa4 1 0 27 1541 670 Apr 29, 2020 - 03:07 MDT (28 labels)
vortex Trilinos-atdm-ats2-gnu-7.3.1-spmpi-2019.06.24_serial_static_dbg c1bfa4 1 0 0 5 2130 Apr 29, 2020 - 03:09 MDT (28 labels)
vortex Trilinos-atdm-ats2-gnu-7.3.1-spmpi-2019.06.24_serial_static_opt c1bfa4 1 0 0 1424 738 Apr 29, 2020 - 03:10 MDT (28 labels)
vortex Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_opt_cuda-aware-mpi   1 0 28 6 2259 Apr 29, 2020 - 04:12 MDT (28 labels)
vortex Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg_cuda-aware-mpi   1 0 27 1432 779 Apr 29, 2020 - 06:27 MDT (28 labels)

and here showing:

Site Build Name Revision Error Warn Not Run Fail Pass Start Time Labels
chama Trilinos-atdm-tlcc2-intel-debug-openmp e89669 1 0 0 0 0 1 2027 Apr 30, 2020 - 05:09 MDT (26 labels)
eclipse Trilinos-atdm-cts1-intel-19.0.5_openmpi-4.0.1_openmp_static_opt e89669 1 0 0 0 0 2 2244 Apr 30, 2020 - 03:07 MDT (28 labels)
eclipse Trilinos-atdm-cts1-intel-19.0.5_openmpi-4.0.1_openmp_static_dbg e89669 1 0 0 0 0 4 2227 Apr 30, 2020 - 03:11 MDT (28 labels)
mutrino Trilinos-atdm-mutrino-intel-opt-openmp-KNL-panzer e89669 1 0     0 0 171 Apr 30, 2020 - 03:02 MDT Panzer
mutrino Trilinos-atdm-mutrino-intel-complex-openmp-opt-KNL e89669 1 0     0 0 1076 Apr 30, 2020 - 05:50 MDT (13 labels)

This is bad and very confusing. If the configure does not pass, then no build or test results should be attempted or posted.

bartlettroscoe commented 4 years ago

The problem is that the two ctest -S drivers don't know about each other the second does not know what happened in the first.

To fix this, we could do as follows:

I think that logic will ensure that if a configure was attempted and failed in a prior ctest -S invocation, then follow-up ctset -S script will skip everything and just exit.

Implement logic in TRIBITS_CTEST_DRIVER() that writes /ConfigureAttempted.txt and /ConfiurePassed.txt and use that logic to avoid later building and running tests if the configure failed

bartlettroscoe commented 3 years ago

This should have been done for a while as shown in commit https://github.com/bartlettroscoe/TriBITS/commit/4aa837b9d6feed947d533b5314ca77da6bcab7e2, merged to TriBITS 'master' in commit https://github.com/TriBITSPub/TriBITS/commit/43f2cc1db722fae65a5d044807d7cdd6f736356a, and merged the snapshot of TriBITS into Trilinos 'develop' through the Trilinos TriBITS snapshot commit trilinos/Trilinos@3143ca8 in Trilinos PR trilinos/Trilinos#7325.

But I am not sure this is working as it should since it looks like we still have some cases where builds and tests were attempted with configure failures as shown in:

which shows:

Specialized

Site Build Name Update Update Time Conf Err Conf Warn Conf Time Build Err Build Warn Build Time Test Not Run Test Fail Test Pass Test Time Test Proc Time Start Test Time Labels
mutrino Trilinos-atdm-ats1-hsw_intel-19.0.4_mpich-7.7.6_openmp_static_dbg 927403 2m 18s 2 0 11m 5s 0 0 49m 46s 0 0 0 0s 0s Jul 20, 2020 - 00:52 MDT (31 labels)
mutrino Trilinos-atdm-ats1-knl_intel-18.0.5_mpich-7.7.6_openmp_static_dbg 927403 2m 6s 2 8 1h 55m 14s 0 0 40s 0 0 0 0s 0s Jul 20, 2020 - 00:44 MDT (31 labels)
mutrino Trilinos-atdm-ats1-knl_intel-19.0.4_mpich-7.7.6_openmp_static_dbg 927403 2m 6s 2 8 2h 6m 13s 0 0 53m 35s 0 0 0 0s 0s Jul 20, 2020 - 00:32 MDT (31 labels)
mutrino Trilinos-atdm-ats1-hsw_intel-18.0.5_mpich-7.7.6_openmp_static_opt 927403 2m 18s 2 0 53m 30s 0 50 13m 42s 0 11 2275 4h 10m 9s 9h 12m 9s Jul 20, 2020 - 00:22 MDT (31 labels)
mutrino Trilinos-atdm-ats1-hsw_intel-19.0.4_mpich-7.7.6_openmp_static_opt 927403 2m 48s 2 8 1h 14m 22s 0 50 22m 53s 0 11 2096 3h 34m 6s 7h 51m 25s Jul 20, 2020 - 00:12 MDT (31 labels)
mutrino Trilinos-atdm-ats1-knl_intel-18.0.5_mpich-7.7.6_openmp_static_opt 927403 2m 6s 2 8 3h 4m 11s 0 50 28m 22s 0 17 2264 5h 58m 26s 12h 14m 51s Jul 20, 2020 - 00:02 MDT (31 labels)
mutrino Trilinos-atdm-ats1-knl_intel-19.0.4_mpich-7.7.6_openmp_static_opt 927403 2m 36s 2 8 3h 43m 13s 0 50 36m 56s 0 16 2265 5h 58m 34s 12h 14m 12s Jul 19, 2020 - 23:52 MDT (31 labels)

I am going to have to keep an eye on this or even run a case manually that triggers this.