cdash_analyze_and_report.py: Implement --require-test-history-match-nonpassing-tests=off

bartlettroscoe commented 4 years ago

There are cases where it is nice to be able to add additional filters to the --cdash-nonpassed-tests-filters=<filters> filters to remove tests for random system failures that we don't even want to bother dealing with. For example, in https://github.com/trilinos/Trilinos/issues/6861 there may be thousands of failing tests on a given testing day that match a string like Remote JSM server is not responding on host. Now, it would be great to handle those types of tests with a expected_fail_regex field in the --tests-with-issue-trackers-file=<file> CSV file. But it will take some time to implement that feature. But it would also be nice to just get rid of tests like this from the global list of failing tests that you start with in the first place because you don't want to clock that link Non-passing Tests on CDash and be hit with 2000+ failing tests that you will then need to filter out. So even when we implement a expected_fail_regex field, it would still be nice to be able to add extra filter fields to based on test output to the --cdash-nonpassed-tests-filters=<filters> filters.

However, currently the tool cdash_analyze_and_report.py will with an error message for the case where a tracked issue is failed for the current testing day in the test history query but is filtered out of the global list of nonpassing tests like:

Traceback (most recent call last):
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email-early/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 840, in 
    printDetails=inOptions.printDetails,
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email-early/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 226, in foreachTransform
    list_inout[i] = transformFunctor(list_inout[i])
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email-early/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 1551, in __call__
    "   top test history dict = "+sorted_dict_str(testHistoryLOD[0])+"\n\n" )
Exception: Error, test testDict['status'] = 'None' != top test history testStatus = 'Failed' where:

   testDict = {'buildName': 'Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg', 'issue_tracker': '#6361', 'issue_tracker_url': 'https://github.com/trilinos/Trilinos/issues/6361', 'site': 'vortex', 'testname': 'MueLu_ParameterListInterpreterTpetra_MPI_4'}

   top test history dict = {u'buildName': u'Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg', u'buildSummaryLink': u'buildSummary.php?buildid=5247384', u'buildstarttime': u'2020-02-19T02:07:36 MST', u'details': u'Completed (Failed)\n', u'nprocs': 4, u'prettyProcTime': u'38s 960ms', u'prettyTime': u'9s 740ms', u'procTime': 38.96, u'site': u'vortex', u'siteLink': u'viewSite.php?siteid=341', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=84551703&build=5247384', u'testname': u'MueLu_ParameterListInterpreterTpetra_MPI_4', u'time': 9.74}

This is an important check for the normal test case where the test history for the test should match exactly the status of a nonpassing tests in the global query of nonpassing tests. That is an important sanity check. However, it would be nice to provide a mode where this check could be avoided.

To that end, this story will be to add the option --require-test-history-match-nonpassing-tests with default value on but allow setting it to off which will skip this check and will allow it to be a mismatch. In this case, the tracked test will be missing in the global query of nonpassing tests but will be a failing test when looking at the test history. In this case, the test should be marked as missing. but allow it to be listed as failing in the test history. That will be confusing, but who cares. Once we implement the expected_fail_regex field, then those tests would be filtered out of the test history as well so it will be consistent.

bartlettroscoe commented 4 years ago

This is going to help deal with the mass of random system failures occurring in trilinos/Trilinos#6861.

bartlettroscoe commented 4 years ago

I have deployed this and I updated the TrilinosATDMStatus scripts to use this in the commit:

*** Base Git Repo: TrilinosATDMStatus
d38ab5d "Add option --require-test-history-match-nonpassing-tests=off (trilinos/Trilinos#6861)"
Author: Roscoe A. Bartlett <rabartl@sandia.gov>
Date:   Thu Feb 20 10:56:55 2020 -0700 (2 hours ago)

M       trilinos_atdm_builds_status.sh
M       trilinos_atdm_specialized_cleanup_builds_status.sh

I will leave this in review for a bit to make sure nothing goes bad.

bartlettroscoe commented 4 years ago

This has been working well, for what it is. I will close.

TriBITSPub / TriBITS

cdash_analyze_and_report.py: Implement --require-test-history-match-nonpassing-tests=off #301