Feature discussion: Store and view the results of various tests

p12tic commented 4 years ago

Having test results being shown somewhere easily accessible from a build page would be a great productivity boost. Currently people need to dig into the logs. If we tracked what tests fail on what builds, we would have a great deal of useful information that would save people a lot of time.

The feature should implement the following use cases:

Code issues:
- Analyze logs produced by e.g. pylint or flake8
- Submit review comments on third-party code browser such as GitHub, Gitlab, BitBucket, etc.
- Store the test results in the database for analytics purposes
- Test pass/fail results
- Analyze test pass/fail logs produced by various testing frameworks
- Track which tests fail since what commit. That would allow people to immediately know which commits are to blame. Further tooling on the BuildBot side could automatically start tests to bisect the failure.
- Store the test results in the database for analytics purposes. E.g. this would allow to detect an area of the most unstable tests and give information about the root cause.
- Test quantitative results
- This use case covers any interesting numeric information coming from a test or some analysis tool. For example, performance metrics, binary size or memory usage metrics categorized by file or code area and so on.
- Analyze logs produced by various testing frameworks or other tooling
- Track the results of tests across time which allows detection of performance regressions.

p12tic commented 4 years ago

Proposed database schema

(will be edited as the discussion goes on)

The use cases that need to be supported by the feature have a lot in common, but at the same time there are significant differences of what the users of BuildBot could reasonably expect. This requires the design to be generic enough.

The following is proposed pseudo-schema of the new tables. The schema is slightly denormalized so that performing queries does not introduce too much table joins. In the schema below, pk is primary key, fk is foreign key.

TestResultSet table:
 - (maybe) project_id (int, fk)
 - builder_id (int, fk)
 - build_id (int, fk)
 - step_id (int, fk)
 - testresultset_id (int, pk)
 - testresultset_type (str)
 - testresultset_value_unit (str)

A TestResultSet is an entity that represents all interesting information of a particular type that is produced by a step. For example, this could be a set of code warnings, or a set of performance results. The TestResultSet table stores information related to a TestResultSet. Additionally, the table also includes project_id, builder_id and build_id fields so that it's possible to easily query for all TestResultSets for a particular project, builder or build. Finally, we will be able to create a clustered index over project_id, builder_id, build_id, step_id which will move related test data together in the table and thus allow very large table sizes without affecting performance too much.

TestResultSetData table:
 - testresultset_id (int, fk)
 - data_type (str)
 - data (blob)

This table stores the unparsed data that produces a complete TestResultSet. TestResultSetData forms 0..many relationship to TestResultSet.

TestCodePath table:
 - builder_id (int, fk)
 - filepath_id (int, pk)
 - filepath (str)

This table stores the file paths for TestResult. builder_id is included to be able to apply a clustered index on it and move related data together in the table. It is expected that this table will be queried for all test paths related to a builder.

TestName table:
 - builder_id (int, fk)
 - testname_id (int, pk)
 - testname (str)

This table stores the test names for TestResult. builder_id is included to be able to apply a clustered index on it and move related data together in the table. It is expected that this table will be queried for all test names related to a builder.

TestResult table:
 - builder_id (int, fk)
 - testresultset_id (int, fk)
 - testresult_id (int, pk)
 - testname_id (int, fk, nullable)
 - filepath_id (int, fk, nullable)
 - line (int)
 - col (int)
 - value (???)

This table stores the actual test results. builder_id is included to be able to apply a clustered index on builder_id and testresultset_id and move related data together in the table. The table includes all information that could be possibly useful to a test, even the potentially unneeded data. For example, "code issues" tests will probably not use value field, whereas pass/fail or performance tests will probably not use the file path information.

The interpretation of the value field depends testresultset_type and testresultset_value_unit of the particular TestResultSet.

tardyp commented 4 years ago

Hi, this looks great. I am not sure of the necessity of the TestResultSet, and TestResultSetData tables. This sound a bit redundant as those data should be in the logs already. why store them in unparsed format.

Actually commenting in the issue in a bit awkard. maybe you can send a WIP PR, with model.py updated, and with 4 new raml files describing the data model from REST api point of view. having both data model reasoned at the same time sounds useful for me.

being able to put inline comments also looks very useful.

Also, having some example test data would help me understand your means.

bdbaddog commented 4 years ago

Allowing the user to upload a junit or similar result file would be very useful. Many test infastructures already generate such.

buildbot / buildbot

Feature discussion: Store and view the results of various tests #5164

Proposed database schema