feiwang3311 / Lantern

BSD 3-Clause "New" or "Revised" License
167 stars 15 forks source link

Investigate non-deterministic CI failures #19

Closed dan-zheng closed 5 years ago

dan-zheng commented 5 years ago

Travis CI is failing non-deterministically for some reason. Usually it takes me 2-3 attempts of retriggering CI for tests to pass.

feiwang3311 commented 5 years ago

What is the error message like when it fails? I noticed it sometimes too. It is my test design problem, because some of the tests requires to run the generated C++ code. The testRun function always generate the C++ code in /tmp/snippet.cpp, compile it to /tmp/snippet, then run it. Sometimes the resource is not available, and I see error messages such as: "/tmp/snippet" file is not available (probably because tests are normally run with multi-threads). I think I can add a fix such that the file name is random string.

dan-zheng commented 5 years ago

From my observations, error messages differed from run to run. Here's one example:

[info] - add_broadcast4 *** FAILED ***
[info]   java.io.IOException: Cannot run program "/tmp/snippet": error=2, No such file or directory
[info]   at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
[info]   at scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:71)
[info]   at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.lineStream(ProcessBuilderImpl.scala:143)
[info]   at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.lineStream(ProcessBuilderImpl.scala:109)
[info]   at scala.sys.process.ProcessBuilder.lines(ProcessBuilder.scala:178)
[info]   at scala.sys.process.ProcessBuilder.lines$(ProcessBuilder.scala:178)
[info]   at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.lines(ProcessBuilderImpl.scala:87)
[info]   at lantern.DslDriverC.eval(dslapi.scala:501)
[info]   at lantern.BroadCastingTest.$anonfun$new$5(test_broadcast.scala:85)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)

Perhaps randomizing snippet filenames fixes this issue. I'll observe and close this issue if that is the case.

feiwang3311 commented 5 years ago

Yep. That was the same problem. My recent push should have used random name now.

dan-zheng commented 5 years ago

One downside to randomizing snippet filenames is that each test invocation generates a new file in /tmp. If tests are run many times, there'll be an explosion in the number of snippet files.

A better strategy may be to use the testcase name as the snippet filename, e.g. test("vector-vector-dot") -> /tmp/vector-vector-dot.cpp.

Or, if the testcase name is not accessible, a good half-measure would be to use a common prefix for all snippet filenames, e.g. /tmp/lantern-<...>.cpp. This makes it easier to identify the snippet files and delete them all at once, when desired.

TiarkRompf commented 5 years ago

I think it's better to run tests sequentially anyways (for reproducibility) so let's just turn parallelism off (there's an sbt flag to do that, check LMS repo).

But let's also name files according to test names, as you suggest @dan-zheng.

dan-zheng commented 5 years ago