Closed fonsp closed 3 years ago
My "guess of what happened" was wrong: the issue is still there after adding sleep
calls.
I still don't know what is causing the CI failures, but I will close this issue until I have something more concrete. (My next guess is that the github computer ran out of memory due to some leak.)
(I released the next Pluto update which is Julia 1.6 compatible!)
Sorry for the trouble!
I am running tests for Pluto.jl on GitHub Actions with Windows on Julia 1.6.0-beta, and it fails ~50% of the time with an error that seems unrelated to my code. On previous Julia versions, Pluto's tests pass very consistently.
The 1.6-compatility PR is here: https://github.com/fonsp/Pluto.jl/pull/842 The change to support 1.6 was just a single line (unrelated to the failures), all other commits are to trigger a rerun of the tests.
To run these tests:
Context
The tests fail in
@testset "WorkspaceManager"
. This is the part of Pluto's codebase that usesDistributed
to launch and control worker processes for notebooks. The code forPluto.WorkspaceManager
is here.Summary of test results
On Windows 1.6, about half of the tests fail. On other OS/Julia versions all tests pass. Windows 1.6 failures fall in these categories:
WorkspaceManager
tests failed after the 6hour timeout with aReadOnlyMemoryError()
([1], [2], [3])WorkspaceManager
tests failed with anEXCEPTION_ACCESS_VIOLATION
([1])WorkspaceManager
tests failed with anInitError(mod=:Profile, error=ErrorException("could not allocate space for 10000000 instruction pointers"))
([1])WorkspaceManager
tests failed with anOutOfMemoryError
([1])My guess of what happened
Some of the test failures have a stack trace that point to: https://github.com/fonsp/Pluto.jl/blob/598bd4384f29444631a7c67da8da971ef545e4db/test/WorkspaceManager.jl#L31 This line (31), and the line before (30), both create a new Distributed process and initialize the notebook runner environment. This happens synchronously, but in quick succession. Lots of previous tests also created a new process, but this is the first test where it happens twice with little code inbetween.
Why I am posting this issue
I understand that this is far from a MWE, and I only have a vague idea of where the tests fail. My hopes are: