NationalSecurityAgency / timely

Accumulo backed time series database
https://code.nsa.gov/timely/
Apache License 2.0
377 stars 108 forks source link

Flaky TimeSeriesGroupingIteratorTest #242

Open Agorguy opened 1 year ago

Agorguy commented 1 year ago

Hello,

We tried running your project and discovered that it contains some flaky tests (i.e., tests that nondeterministically pass and fail). We found these tests to fail more frequently when running them on certain machines of ours.

To prevent others from running this project and its tests in machines that may result in flaky tests, we suggest adding information to the README.md file indicating the minimum resource configuration for running the tests of this project as to prevent observation of test flakiness.

If we run this project in a machine with 1cpu and 500mb ram, we observe flaky tests. We found that the tests in this project did not have any flaky tests when we ran it on machines with 2cpu and 4gb ram.

Here is a list of the tests we have identified and their likelihood of failure on a system with less than the recommended 2 CPUs and 2 GB RAM.

  1. timely.store.iterators.TimeSeriesGroupingIteratorTest#testTimeSeriesDropOff (3 out 50)
  2. timely.store.iterators.TimeSeriesGroupingIteratorTest#testManySparseTimeSeries (4 out 50)
  3. timely.store.iterators.TimeSeriesGroupingIteratorTest#testMultipleTimeSeriesMovingAverage ( 4out 50)

Please let me know if you would like us to create a pull request on this matter (possibly to the readme of this project).

Thank you for your attention to this matter. We hope that our recommendations will be helpful in improving the quality and performance of your project, especially for others to use.

Reproducing


FROM maven:3.5.4-jdk-11

WORKDIR /home/

RUN git clone https://github.com/NationalSecurityAgency/timely

WORKDIR /home/timely

RUN mvn install -DskipTests 

ENTRYPOINT ["mvn", "test", "-fn"]

Build the image:

$> mkdir tmp

$> cp Dockerfile tmp

$> cd tmp

$> docker build -t timely . # estimated time of build 3m

Running: this configuration likely prevents flakiness (no flakiness in 10 runs)

$> docker run --rm --memory=2g --cpus=2 --memory-swap=-1 timely | tee output.txt
$> grep "Failures:"  output.txt # checking results

checking results

this other configuration –similar to the previous– can’t prevent flaky tests (observation in 10 runs)

$> docker run --rm --memory=500mb --cpus=1 --memory-swap=-1 timely | tee output2.txt
$> grep "Failures:"  output2.txt # checking results
ctubbsii commented 1 year ago

@Agorguy Thanks for the report, but if the tests are flaking out solely due to being run in a severely resource constrained environment, that's probably not a problem for this project. This project is designed to work work with Apache Accumulo, an inherently "big data" application, and it is not expected that to be run in such a constrained environment.

So, if the tests are failing with timeouts, or out of memory issues, or similar, that's probably not a problem for this project. However, if the tests are failing with failed assertions, then voluntary contributions in PRs to make the test assertions more robust might be appreciated (like a test that looks for a watched condition that usually happens immediately, could be made to wait a bit longer for the condition to occur before failing after a time).

Agorguy commented 1 year ago

@ctubbsii I appreciate your perspective on resource constraints and robust test assertions. However, it's important to specify minimum system requirements to prevent flaky test failures for users of this project.

While designed for resource-rich environments, it's crucial to consider scenarios where users unintentionally run the project in other environments. Flaky test failures unrelated to their changes can lead to unnecessary debugging efforts.

Our aim is to educate developers and users about the importance of specifying minimum requirements. This ensures project resilience against flaky tests, informs users about the expected environment, and avoids confusion or false bug reports due to insufficient resources.

Including this information is a low-effort task that saves time debugging test failures when machines fall below the minimum specification. I understand that the project is complex, and fixing the flaky tests proved challenging for me. Despite my efforts, I couldn't resolve the issue due to the project's intricacies.

To provide context, I'll share the stack traces of the flaky tests encountered, let me know if you have any ideas how to solve the problem based on this stack trace and I can give it one more try.

Agorguy commented 1 year ago

Stacktraces:

[ERROR] testTimeSeriesDropOff on testTimeSeriesDropOff(timely.store.iterators.TimeSeriesGroupingIteratorTest)(timely.store.iterators.TimeSeriesGroupingIteratorTest)  Time elapsed: 4.017 s  <<< FAILURE!
java.lang.AssertionError: expected:<21.0> but was:<4.0>
    at timely.store.iterators.TimeSeriesGroupingIteratorTest.checkNextResult(TimeSeriesGroupingIteratorTest.java:297)
    at timely.store.iterators.TimeSeriesGroupingIteratorTest.testTimeSeriesDropOff(TimeSeriesGroupingIteratorTest.java:160)
ctubbsii commented 1 year ago

@Agorguy I ran the build on my laptop with 32GB RAM, and the TimeSeriesGroupingIteratorTest also failed there. So, I think this test is flaky regardless of resources, but I don't know how to fix it. So, I will leave that to its primary maintainers to fix when they have time. Thanks for bringing this test flakiness to our attention.