TriBITSPub / TriBITS

TriBITS: Tribal Build, Integrate, and Test System,
http://tribits.org
Other
36 stars 47 forks source link

cdash_analyze_and_report.py: summary statistics mode #580

Open skyreflectedinmirrors opened 1 year ago

skyreflectedinmirrors commented 1 year ago

For my use case, I am interested in reporting summary statistics, e.g., # of passing tests, # of failed / missing tests, and a % pass rate, broken down over time for various issue tracker types (read: JIRA instances), subprojects, and/or even different GPU architectures.

Specifically, my use-case tracks tickets on a number of internal, and customer-facing JIRA instances, and I would like to (at a glance) be able to share build quality information with folks that I strongly suspect will never actually click through to the dashboard. Ideally, I'd like to have both the information for the current build, and some sort of time-average (or plot, but I noted you said "standard Python 3.x" in https://github.com/TriBITSPub/TriBITS/issues/577 @bartlettroscoe, which probably precludes matplotlib) of the build quality results over say, the last 30 days.

In addition, I also have tests in our test-suite broken down via subproject / ctest-label into things like "runtime" / "compiler" / "hip_and_omp_interop", etc., which form natural groupings for us internally to see how various components we ship are doing. I would also probably want to add a similar type of reporting as described for the JIRA instances above (build quality, time history).

A future goal would be JIRA integration (assuming such a thing can be done programmatically using minimal dependencies) to automatically pull in ticket status / days open / priorities / etc., to combine with the summary stats table.

I am happy to do the legwork on this, but I'll need some guidance on the design front I suspect.

skyreflectedinmirrors commented 1 year ago

Copied to new issue as discussed here: https://github.com/TriBITSPub/TriBITS/issues/578#issuecomment-1540309240

bartlettroscoe commented 1 year ago

For my use case, I am interested in reporting summary statistics, e.g., # of passing tests, # of failed / missing tests, and a % pass rate, broken down over time for various issue tracker types (read: JIRA instances), subprojects, and/or even different GPU architectures.

@arghdos, thanks for adding this issue :-)

Specifically, my use-case tracks tickets on a number of internal, and customer-facing JIRA instances, and I would like to (at a glance) be able to share build quality information with folks that I strongly suspect will never actually click through to the dashboard. Ideally, I'd like to have both the information for the current build, and some sort of time-average (or plot, but I noted you said "standard Python 3.x" in #577 @bartlettroscoe, which probably precludes matplotlib) of the build quality results over say, the last 30 days.

Can you mock up what you are wanting this to look like?

Note that one can write auxiliary tools as well that use the data downloaded by cdash_analyze_and_report.py (see the option --write-test-data-to-file <file-name>) and can reuse lot of the code in the underlying module CDashQueryAnalyzeReport.py. I think it is better to have a family of related tools that operate on similar data and share a lot of the underlying code. That way, each of these tools can have more focus. The current tool cdash_analyze_and_report.py is primarily designed to give a compact overview of a set of related builds and tests for a given testing day and history for these builds and tests over a given time horizon (e.g. 30 days); with an emphasis on making it easy to spot new failures that may need to be triaged. There are some related use cases this tool is designed to address as well (like determining a global pass/fail to gate software integrations) but some of the features listed in #578 would need to be implemented to fully cover those use cases for challenging projects (like we had with ATDM).

but I noted you said "standard Python 3.x" in #577 @bartlettroscoe, which probably precludes matplotlib

Note that it is fine if auxiliary tools use more than just standard Python 3.x (we just don't want the core functionality of the tool cdash_analyze_and_report.py to require more than standard Python 3.x). But if we put these auxiliary tools into the main repo, then we have to make sure the GitHub Actions environments have these installed and we need to have automated testing for them. And local testing becomes harder as well with non-standard dependencies but that is okay.

In addition, I also have tests in our test-suite broken down via subproject / ctest-label into things like "runtime" / "compiler" / "hip_and_omp_interop", etc., which form natural groupings for us internally to see how various components we ship are doing. I would also probably want to add a similar type of reporting as described for the JIRA instances above (build quality, time history).

We have a similar breakdown into subprojects with Trilinos packages. For Trilinos, all tests are prefixed with the package name so it is easy to categorize them just from the test name. However, if you need access to the labels, it looks like that will require a CDash extension as the current test query REST API api/v1/queryTests.php does not seem to provide labels for tests. As of CDash release 3.1.0, the fields currently supplied for each test are:

    {
      "testname": "Amesos2_SolverFactory_UnitTests_MPI_4",
      "site": "rocketman",
      "buildName": "Linux-sems-gcc-8.3.0-SEMS-OPENMPI-4.0.5_RELEASE_DEFAULT",
      "buildstarttime": "2023-05-08T22:16:05 MDT",
      "time": 1.33,
      "prettyTime": "1s 330ms",
      "details": "Completed (Failed)\n",
      "siteLink": "viewSite.php?siteid=509",
      "buildSummaryLink": "build/11535334",
      "testDetailsLink": "test/160210231",
      "status": "Failed",
      "statusclass": "error",
      "nprocs": 4,
      "procTime": 5.32,
      "prettyProcTime": "5s 320ms",
      "measurements": []
    },

(For example, see here).

Adding test labels is something we can ask Kitware to put in for us (and in fact, that is already on our CDash backlog, which is a very long list).

A future goal would be JIRA integration (assuming such a thing can be done programmatically using minimal dependencies) to automatically pull in ticket status / days open / priorities / etc., to combine with the summary stats table.

That is similar to what we started for GitHub Issues with the Grover tool described in:

The initial implementation (i.e. "Minimum Viable Product") of Grover is very simple and just gives the status of the tests associated with each issue tracker on a regular (weekly) basis. With that tool, most of the heavy lifting is done with code in the module CDashQueryAnalyzeReport.py. For example, the class CDashQueryAnalyzeReport.IssueTrackerTestsStatusReporter is used in the Grover tool to create the HTML text for the status of the tests to add to a GitHub Issue comment. (There is some more work to be done with these and related tools to make this a more sustainable process.) For our use case, some of those remaining features are listed in the remaining Tasks in https://github.com/trilinos/Trilinos/issues/3887#issue-381335600. But that project ended and the effort to maintain a larger set of customer-focused Trilinos builds and tests went away (due to lack of funding and staffing). Therefore, there has not been much development on the tool in a couple of years (but there are some internal customers using it). But there is some hope of continuing that work with some other internal customers in FY24 (hence, good timing for this interaction).

skyreflectedinmirrors commented 1 year ago

Can you mock up what you are wanting this to look like?

Note that one can write auxiliary tools as well that use the data downloaded by cdash_analyze_and_report.py (see the option --write-test-data-to-file ) and can reuse lot of the code in the underlying module CDashQueryAnalyzeReport.py. I think it is better to have a family of related tools that operate on similar data and share a lot of the underlying code. That way, each of these tools can have more focus.

Agreed -- I can take a look at this over the next week or two, but I like the idea of having multiple related tools that can drive seperate reports / functionality, while using the same module. Point noted about the wrinkle of extra dependencies. I can perhaps use some of those tools (matplotlib, tabulate, pandas -- would be my list) as part of the mock-up and learn how the current table-gen works / we can decide where it should live after the fact.

However, if you need access to the labels, it looks like that will require a CDash extension as the current test query REST API api/v1/queryTests.php does not seem to provide labels for tests

Ahhh, I had been experimenting with direct DB access before this, and assumed everything was visible via REST that is visible there. I have been enforcing a strict naming convention however, so I should be OK relying on a similar string-parsing technique like you mention for Trillinos. Will have to take a look at Grover as well :)

bartlettroscoe commented 1 year ago

However, if you need access to the labels, it looks like that will require a CDash extension as the current test query REST API api/v1/queryTests.php does not seem to provide labels for tests

Ahhh, I had been experimenting with direct DB access before this, and assumed everything was visible via REST that is visible there.

I would try to avoid direct DB access because Kitware does make changes to the DB schema from time to time. The REST API is supposed to be a more stable way to access the data (and they are still on 'api/v1' for the last 9 years or so since they first added that feature). It is not a big deal to have them add labels and a few other fields to the api/v1/queryTests.php JSON data (but we will have to wait until a CDash release).

Will have to take a look at Grover as well :)

Currently Grover is not open-source so you can't see the full tool. But the HTML text for the GitHub comment is completely created by the class CDashQueryAnalyzeReport.IssueTrackerTestsStatusReporter. The code that is unique to Grover must just does the communication with GitHub to take that text and put it in the GitHub issue comment. The other bit of code called by Grover after reading in the tests list of dicts (testsLOD) from a file (a file written by the tool cdash_analyze_and_report.py) is binTestDictsByIssueTracker(testsLOD).