Open bartlettroscoe opened 3 years ago
One proposed solution would be to add an argument called something like:
--show-history-for-tests-without-issue-trackers-failed-in-last-x-days=<num-days>
Then, any tests that failed at least once in the last <num-days>
would be listed with their test history in a table called something like:
Tests without issues trackers recently failed (at least once last <num-days>
days, limited to <max-rows>
): twoirf=<twoirf>
For example, for the Trilinos Secondary builds running with:
--show-history-for-tests-without-issue-trackers-failed-in-last-x-days=7
the email sent out would show a table like:
Site | Build Name | Test Name | Status | Details | Days since last Failed | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
ride | Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug | KokkosCore_UnitTest_CudaInterOpInit_MPI_1 | Passed | Completed | 3 | 7 | 17 | |
cee-rhel7 | Trilinos-atdm-cee-rhel7_cuda-10.1.243_gnu-7.2.0_openmpi-4.0.3_shared_dbg | KokkosCore_UnitTest_CudaTimingBased_MPI_1 | Missing | Missing | 4 | 2 | 15 |
This table would be listed right below the table:
Tests without issue trackers Failed (limited to <max-rows>
): twoif=<twoif>
so it would appear near the top of the email in the summary paragraph and in the list of tables.
NOTE: The "Consecutive" column in the other test tables replaced with the column "Days since last Failed" which gives a link to the most recent failure pf the test.
NOTE: In this table "Failed" means tests that had CDash test status="Failed" and not status="Not Run". Therefore the number in the column "Non-pass Last 30 Days" would be for tests with status="Failed" and status="Not Run". (Hopefully that will not be to confusing. If it is, we could replace "Failed" with "Non-pass" to be consistent. But since this table 'twiorf' says "Failed", perhaps that is not confusing?)
Also, it is possible that the table 'twoirf' could replace the table 'twoif' and show all tests that failed over the last <num-days>
tests, including the tests failing for the current/reference testing day. But that would mean that you would not see the column "Consecutive Non-pass Days" so you would loose the information (which is critical to see that a test is failing every day and is not random). Also, by loosing the table 'twoif' we don't get an actual accounting of how many tests actually failed for the current/reference testing day so that may not be a good idea to remove the table 'twiof'.
The implementation would really not be that complicated. What you would do is to is to take the cdash/queryTests.php
URL and replace date=YYYY-MM-DD
with begin=<begin>&end=YYYY-MM-DD
where <begin>
is <num-days>
before the reference date YYYY-MM-DD
and download the non-passing test data from CDash for that (in addition to the cdash/queryTests.php
data with date=YYYY-MM-DD
).
To get the list of tests using that cdash/queryTests.php
URL (only showing Failed
tests) to show in the table "twoirf" the tool would:
Get a unique list of tests [<site>
, <buildname>
, <testname>
] over that <num-days>
days (using existing tested function getUniqueTestsListOfDicts()
that needs to be moved into CDashQueryAnalyzeReport.py
or a specialized version of that function).
From that unique list of tests [<site>
, <buildname>
, <testname>
] we would select those that:
<num-days>
days, andA few simple filters can do that easily. And it is that list that becomes the list for "twoirf" for which the tool would show in that table.
NOTE: We need to limit the number of tests shown in the table "twoirf" to <max-rows>
or it could get test history and show thousands of tests for a really bad day in the previous <num-days>
days.
NOTE: I don't think we want to show "Not Run" tests in the table "twoirf" because that would pollute the table if one or more builds has massing build errors over the last <num-days>
days. I think we are only interested in failing tests is that table. If there is persistent build error that results in "Not Run" tests, then we will see that in the table "Tests without issue tracker Not Run: twoinr=???" for the current testing day.
NOTE: To build the URL for this list of tests robustly to also filter out know system failures (that is passed to the cdash/queryTests.php
filer), I think we should first implement and use the new input arguments in #348. But is is not strictly needed since we can filter out "not run" tests by just taking the input URL for cdash/queryTests.php
and replacing date=YYYY-MM-DD
with begin=<begin>&&end=YYYY-MM-DD
. In fact, that is the easier solution.
NOTE: To get the list of tests that failed in the last <num-days>
but not today, it may be easier and cheaper to split the list of filing tests that failed in the last 7 days from the second cdash/queryTests.php
query using begin=<begin>&end=YYYY-MM-DD
because to do that, we only need to split that sublist based the status
test field (i.e. those with status=Failed
goes into the table 'twoif' and those with status=Pass
or status=Missing
going into the table 'twoirf'). Other implementation approaches may be used as well but we need to be careful to keep down the algorithmic complexity of this operation for worst case scenarios (or the tool will take a long time if there are thousands of failing tests either for the current testing day or over the last <num-days>
days).
NOTE: This will be somewhat hard to write tests test for since it will require some dummy test data for this use case. But hopefully that will not be too hard to manufacture for current set of reference builds and tests used in the automated testing. (But writing system-level tests for this tool is always harder than writing the production code but we get very strong tests by doing so.)
@bartlettroscoe: Thanks for looking into this. I like proposed solution 1 above. Thinking about this more now -- in an effort to reduce the size of the emails and provide a general solution, would it be possible to adjust the 'Failed' link under the 'Status' column of the existing tables to link to the most recent failure within the given begin and end dates of the input URL for cdash/queryTests.php
? To improve usability, if the input URL spans more than one day, could we make the link text 'Failed on <date>'
as well as append ' to <end-date>'
to the subject line. Would this be difficult to implement?
Would this be difficult to implement?
@e10harvey, I am not entirely sure what is being suggested above so we should chat through this. Can we set up a short meeting offline?
What @e10harvey is suggesting is updating the table 'twoif' to include the most recent failure within the date range of --show-history-for-tests-without-issue-trackers-failed-in-last-x-days=7
. If the cdash_analyze_and_report.py
input arguments span more than one day, we could make the link text 'Failed on <date>'
as well as append ' to <end-date>'
to the subject line. Here is an example of what the table would look like:
Site | Build Name | Test Name | Status | Details | Consecutive Pass Days | Non-pass Last 30 Days | Pass Last 30 Days | Issue Tracker |
---|---|---|---|---|---|---|---|---|
ride | Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug | KokkosCore_UnitTest_CudaInterOpInit_MPI_1 | Failed on 2021-01-21 | Completed (Failed on 2021-01-21) | 3 | 7 | 17 | |
cee-rhel7 | Trilinos-atdm-cee-rhel7_cuda-10.1.243_gnu-7.2.0_openmpi-4.0.3_shared_dbg | KokkosCore_UnitTest_CudaTimingBased_MPI_1 | Failed on 2021-01-17 | Completed (Failed on 2021-01-17) | 2 | 2 | 15 |
@bartlettroscoe: I updated your comment and deleted a duplicate of that comment.
@e10harvey, it just occurred to me that the "Consecutive ??? Days" column this is ill-defined in this new 'twoirf' table since we are now mixing tests that could be passing, missing, or not-run (instead of failing) for the current/reference testing day.
One idea to address your desire to provide a link to the most current failure and to address the ill-defined "Consecutive ??? Days" column is to replace that "Consecutive ??? Days" column with "Days since last Failure" with a number and a link to that most recent failure. For that, I updated "Proposed Solution 1" above with that replaced column with the correct link for this example. That would not be too hard to implement and hopefully be pretty clear and still provide a very compact table.
Let's discuss.
@e10harvey, let's discuss later today but some issues with your Proposed Solution 2 compared to Proposed Solution 1 are:
Failed on YYYY-MM-DD
instead of the current status for the current testing day, the data in the test dict for the "Status" and "Details" field is actually being corrupted and changes the meaning of those fields. That will break some internal code. This would make any other analysis of these tests (either by this tool or another tool that works on exported data) be unable to function if they need the current status of these tests. (This could be addressed by replacing the columns "Status" and "Details" with new fields and new columns in this table and leaving the test dict "Status" and "Details" field alone.)Failed on YYYY-MM-DD
one actually can't tell if that test is currently passing, failing, missing, or not-run (unless it is failing on the current testing day but giving the date YYYY-MM-DD does not make that clear unless you realize that that is the current reference testing day). (This is really related to the above but at the user level and not just the data-structure level.)Passed
and Completed
with longer fields like Failed on YYYY-MM-DD
and Completed (Failed on YYYY-MM-DD
) this will force many rows to wrap lines that may not have wrapped before and make the tables harder to read.--show-history-for-tests-without-issue-trackers-failed-in-last-x-days=7
then fine, but there is no hit at all in the table produced itself that that is what "failed" means at all. Alternatively, the table in "Proposed Solution 1" with the name Tests without issue trackers recently failed (at least once last 7 days, limited to 300): twoirf=2 is unambiguous and does not rely on the user having to know that the argument --show-history-for-tests-without-issue-trackers-failed-in-last-x-days=7
was being used or not.NOTE: Once one starts clicking on links (especially the history links) then all of this information will become clear in either proposal so it is a matter of keeping the data-structures correct and having the data be accurate and clear displayed in the generated emails.
The experiences with https://github.com/trilinos/Trilinos/issues/8759 where someone thought that the random tests were not occurring anymore (because they all happened to pass the last few Sundays but did not look at the "Nonpassing tests last 30 days" column) suggest that as part of this we should also consider adding an option:
--show-history-for-tests-with-issue-trackers-failed-in-last-x-days=<num-days>
Then, any test with issue tracker that failed at least once in the last <num-days>
would be listed with their test history in a table called something like:
Tests with issues trackers recently failed (at least once last <num-days>
days): twirf=<twirf>
And actually this would be more useful to be displayed in GitHub Issue comments for the Grover tool. For issue trackers that have tests failing regularly (i.e. every day) then we might set <num-day>
to be something very short like 3. But for tests that randomly failing, we might set this to <num-days>=30
. So it would make clear that tests for that issue tracker have failed in the last 30 days.
But to make this must effective, those test failures need to be very specific and need to take into account some expected regexes of test output (and therefore we need to implement the 'expected_fail_regex' field, see TrilinosATDMStatus/TODO.txt).
This makes things more complicated but I think is really needed if people don't carefully look at the existing test-history tables for randomly failing tests.
@ZUUL42, @jwillenbring, @prwolfe, @william76: Please thumbs up one of the following comments to indicate your preference:
I would be most curious to see if @zuul42 or @prwolfe have a preference. I do personally like @bartlettroscoe 's comment about displaying the more detailed info using Grover, but that is just because of my typical interaction with the failures.
And actually for reporting by Grover to the GitHub Issue, you would need to be careful about missing test results from missing builds. If a build is not reporting tests for several days but just happened to report the day before, we don't want to give the impression that the tests have been passing for the last <num-day>
days just because it did not fail in the last <num-days>
days.
CC: @e10harvey
This Story is to scope out (and possibly implement) an extension to the
cdash_anayze_and_report.py
tool that lists out all of the tests without issue trackers with test history for a set of tests that have failed at least once over some previous time period but are passing for the reference testing day (provided with the--date
argument). For example, test [<site>
,<buildname>
,<testname>
] that does not yet have an issue tracker associated with it may be passing for the reference--date=YYYY-MM-DD
may have (randomly?) failed 3 times over the previous 7 days . The current implementation ofcdash_anayze_and_report.py
would not display any information about that test. If someone was not looking at the emails for every day in the previous 7 days, they should not notice this (random) test failure.Therefore, this story is to add a feature to
cdash_anayze_and_report.py
that lists out the test [<site>
,<buildname>
,<testname>
] with its test history that fails in the lastX
days but not the current day. This would need to include tests that pass or are missing (perhaps because their builds are missing) for the reference testing day.Motivating Customer: The Trilinos Framework team wants to move to a process where a set of "Secondary" builds are triaged less frequently. To do that, they want to get one email at the end of every week that also includes information on the tests that failed in the last week but not the reference testing day at the end of the work.
Caused by: https://sems-atlassian-son.sandia.gov/jira/browse/SEPW-281.
Related to: