This repository contains all the data used to build the flaky tests' website, hosted here: http://mir.cs.illinois.edu/flakytests. While the website is only for Java tests for now, this repository includes flaky tests in both Java and Python. Specifically, the pr-data.csv file for Java projects using Maven, gr-data.csv file for Java projects using Gradle and py-data.csv file for Python contain all the information about the flaky tests detected or fixed in the International Dataset of Flaky Tests (IDoFT).
To contribute a newly detected or fixed flaky test to the dataset, please see Contributing detected flaky test or Contributing fixed flaky test, respectively.
echo "$(head -n1 pr-data.csv && tail +2 pr-data.csv | LC_ALL=C sort -k1,1 -k4,4 -t, -f)" > pr-data.csv
for Java Maven, echo "$(head -n1 gr-data.csv && tail +2 gr-data.csv | LC_ALL=C sort -k1,1 -k4,4 -t, -f)" > gr-data.csv
for Java Gradle and echo "$(head -n1 py-data.csv && tail +2 py-data.csv | LC_ALL=C sort -k1,1 -k3,3 -t, -f)" > py-data.csv
for Python.Project URL, SHA Detected, Module Path, Fully-Qualified Test Name (packageName.ClassName.methodName), Category
for Java and Project URL,SHA Detected,Pytest Test Name (PathToFile::TestClass::TestMethod or PathToFile::TestMethod),Category
for Python. Detailed information for the columns can be found here. Note that for Python, we do not have the Module Path
column. Please note that in the Pytest Test Name
column, the notion of TestClass
does not always apply (depending on how developers write tests). We expect the Python testing framework pytest
can directly run the test with pytest $Pytest_Test_Name
. The documentation of pytest
can be found here.format_checker/forked_projects.json
if the project does not exist.Project URL, Module Path, Fully-Qualified Test Name
for Java and Project URL,Pytest Test Name
for Python.[[ $(git show -s --pretty=%at $SHA1) -gt $(git show -s --pretty=%at $SHA2) ]] && echo $SHA1 || echo $SHA2
.flaky-list.json
file for ID and OD tests, respectively). The specific format for the notes is described in Adding notes.For Java:
Project URL | SHA Detected | Module Path | Fully-Qualified Test Name (packageName.ClassName.methodName) | Category | Status | PR Link | Notes |
---|---|---|---|---|---|---|---|
https://github.com/alibaba/fastjson | e05e9c5e4be580691cc55a59f3256595393203a1 | . | com.alibaba.json.bvt.date.DateTest_tz.test_codec | OD |
For Python:
Project URL | SHA Detected | Pytest Test Name (PathToFile::TestClass::TestMethod or PathToFile::TestMethod) | Category | Status | PR Link | Notes |
---|---|---|---|---|---|---|
https://github.com/AguaClara/aguaclara | 9ee3d1d007bc984b73b19520d48954b6d81feecc | tests/core/test_cache.py::test_ac_cache | NIO |
Note that to submit the fix to the developers you likely need to reproduce the flaky-test failure in the latest commit of the repository. We expect that the fix you submit (or a very similar fix) would also remove the flakiness at the SHA Detected commit. If your fix does not remove the flaky-test failure at the existing SHA Detected commit, please create a new row with the SHA Detected as the latest commit of the repository.
For Java:
Project URL | SHA Detected | Module Path | Fully-Qualified Test Name (packageName.ClassName.methodName) | Category | Status | PR Link | Notes |
---|---|---|---|---|---|---|---|
https://github.com/alibaba/fastjson | e05e9c5e4be580691cc55a59f3256595393203a1 | . | com.alibaba.json.bvt.date.DateTest_tz.test_codec | OD | Opened | https://github.com/alibaba/fastjson/pull/2148 |
For Python:
Project URL | SHA Detected | Pytest Test Name (PathToFile::TestClass::TestMethod or PathToFile::TestMethod) | Category | Status | PR Link | Notes |
---|---|---|---|---|---|---|
https://github.com/drtexx/volux | c41339aceeab4295967ea88b2edd05d0d456b2ce | tests/test_operator.py::Test_operator::test_add_module | NIO | Opened | https://github.com/DrTexx/Volux/pull/37 |
For Java:
Project URL | SHA Detected | Module Path | Fully-Qualified Test Name (packageName.ClassName.methodName) | Category | Status | PR Link | Notes |
---|---|---|---|---|---|---|---|
https://github.com/alibaba/fastjson | e05e9c5e4be580691cc55a59f3256595393203a1 | . | com.alibaba.json.bvt.date.DateTest_tz.test_codec | OD | Accepted | https://github.com/alibaba/fastjson/pull/2148 |
For Python:
Project URL | SHA Detected | Pytest Test Name (PathToFile::TestClass::TestMethod or PathToFile::TestMethod) | Category | Status | PR Link | Notes |
---|---|---|---|---|---|---|
https://github.com/chaosmail/python-fs | 2567922ced9387e327e65f3244caff3b7af35684 | fs/tests/test_touch.py::test_touch_on_new_file | NIO | Accepted | https://github.com/chaosmail/python-fs/pull/9 |
To add more information about any test:
flaky-list.json
and original-order
located under the .dtfixingtools
directory generated by iDFlakies. Note that GitHub requires that these files be of .txt
format (e.g., upload flaky-list.txt
and original-order.txt
, respectively).nondexMode
and nondexSeed
from the .nondex/{testid}/config
file generated by NonDex in the comments of the issue. You may also include the whole config
file instead of adding a comment.An example issue can be found here.
For Java:
Project URL | SHA Detected | Module Path | Fully-Qualified Test Name (packageName.ClassName.methodName) | Category | Status | PR Link | Notes |
---|---|---|---|---|---|---|---|
https://github.com/wso2/carbon-apimgt | a82213e40e7e6aa529341fdd1d1c3de776949e64 | components/apimgt/org.wso2.carbon.apimgt.rest.api.commons | org.wso2.carbon.apimgt.rest.api.commons.util.RestApiUtilTestCase.testConvertYmlToJson | ID | Skipped | https://github.com/TestingResearchIllinois/flaky-test-dataset/issues/1 |
For Python:
We do not have an example yet. If someone wants to open an issue for a Python test, the only two differences from Java are that (1) Python tests do not have the column of Module Path
, and (2) the Test Name
is Pytest Test Name (PathToFile::TestClass::TestMethod or PathToFile::TestMethod)
.
Example: https://github.com/wso2/carbon-apimgt
Example: a82213e40e7e6aa529341fdd1d1c3de776949e64
.
if the test is located at the base of the repository.Example: components/apimgt/org.wso2.carbon.apimgt.rest.api.commons
lv.ctco.cukes.plugins.RunCukesTest.Given wait for 1 second
).Example: org.wso2.carbon.apimgt.rest.api.commons.util.RestApiUtilTestCase.testConvertYmlToJson
;
, sorted alphabetically, and contains no spaces. Please use UD
if you do not know the category. When adding OD related tests, it is much appreciated if one can provide the passing and failing order of the test (e.g., the flaky-lists.json
file created by iDFlakies). The accepted categories are:Category | Description |
---|---|
OD | Order-Dependent flaky tests as defined in iDFlakies |
OD-Brit | Order-Dependent Brittle tests as defined in iFixFlakies |
OD-Vic | Order-Dependent Victim tests as defined in iFixFlakies |
ID | Implementation-Dependent Tests found by Nondex |
ID-HtF | Implementation-Dependent tests that are hard to fix. Brief description given in https://github.com/kaiyaok2/ID-HtF. |
NIO | Non-Idempotent-Outcome Tests as defined in ICSE’22 work. Tests that pass in the first run but fail in the second. |
NOD | Non-Deterministic tests |
NDOD | Non-Deterministic Order-Dependent tests that fail non-deterministically but with significantly different failure rates in different orders as defined in our ISSRE’20 work |
NDOI | Non-Deterministic Order-Independent tests that fail non-deterministically but similar failure rates in all orders as defined in our ISSRE’20 work |
UD | Unknown Dependency tests that pass and fail in a test suite or in isolation |
OSD | Operating System Dependent tests that pass and fail depending on the operating system |
TZD | Tests that fail in machines on different time zones, usually failing time-related assertions |
Status | Description |
---|---|
Blank | A blank value denotes that a flaky test was detected and is yet to be inspected |
Opened | For tests where a PR was opened to fix the flaky test |
Accepted | For tests where a PR was accepted to fix the flaky test |
InspiredAFix | The work (e.g., issue report, pull request) inspired a fix from the developer, but did not directly change any code. The PR Link should be the link of the PR that the developer merged to fix the flakiness and some Notes should be added to explain how the work inspired the fix |
DeveloperWontFix | For tests where developers claimed that they do not want a fix |
DeveloperFixed | For tests where a developer fixed the tests before a PR was made |
Deleted | For tests that can no longer be fixed as the tests have been removed from the repository after the tests were detected |
Rejected | For tests where a PR was rejected/closed as the developers did not think a fix was necessary |
Skipped | For test which was inspected and should not be fixed (e.g., test is annotated with @Ignore). To use this status, please provide some Notes on why the test should be skipped |
MovedOrRenamed | For test that has a different fully-qualified name on two different shas. This status should be added only to the row with the older sha. To use this status, please also provide some Notes on what the test is renamed to |
RepoArchived | For test that is in an archived repo, which is indicated by GitHub in messages such as "This repository has been archived by the owner. It is now read-only." |
Deprecated | For test that is in a deprecated repository, which is usually indicated in the project README or description as "Deprecated" or a similar message. To use this status, please also provide some Notes that contain a link to the commit that marks the repository as deprecated. |
RepoDeleted | For tests that are in repository that does not exist anymore and the link to the repository throws a 404 status code error. |
MovedToGradle | For tests that are in repository that moved from Maven to Gradle. |
FixedOrder | The test has a fixed order, e.g., using @Order in JUnit 5. |
Unmaintained | For tests that are in a repository that does not have any commits to main/master in the past 2 years. To use this status, please also provide some Notes that contain the last commit date. |
Example: https://github.com/alibaba/fastjson/pull/2148
Example: https://github.com/TestingResearchIllinois/flaky-test-dataset/issues/1
If the test method body is the same between two versions (e.g., if in old_sha
, some.test.name
has the same test method body as some.other.test.name
in new_sha
), we consider the two different versions of the test to be the same test. For the row with the older sha, please change the Status to MovedOrRenamed
and add Notes describing which version the test is found to be renamed/moved.
If the test method body is different, we consider the two different versions of the test to be two different tests. For the row with the older sha, please change the Status to Deleted
and add Notes describing which version the test is found to be renamed/moved and how the test method body differs between the two versions.
For either case, please also add a new row for the newer sha and test name. Once the preceding changes are made, all future pull request updates should only be made to the row with the newer sha.
Please update all rows of the old repository owner and name with the new repository owner and name.
If you use the dataset, please cite this website and our original dataset:
@misc{InternationalDatasetofFlakyTests,
title = {{International Dataset of Flaky Tests (IDoFT)}},
author = {Lam, Wing},
year = {2020},
url = {http://mir.cs.illinois.edu/flakytests}
}
@inproceedings{LamETAL19iDFlakies,
author = "Wing Lam and Reed Oei and August Shi and Darko Marinov and Tao Xie",
title = "{iDF}lakies: {A} framework for detecting and partially classifying flaky tests",
booktitle = "ICST 2019: 12th IEEE International Conference on Software Testing, Verification and Validation",
month = "April",
year = "2019",
address = "Xi'an, China",
pages = "312--322"
}
Wing Lam is the author of this dataset. He thanks all contributors and the students from the Fall 2020 CS 527 class from the University of Illinois at Urbana-Champaign for their contributions.
For any questions about the dataset, please email testflaky@gmail.com.