Closed rikhuijzer closed 5 months ago
I've set fail_fast: false
in 3aec1fb so that GitHub doesn't automatically cancel all runs if one fails.
I think generally in SciML it's better to use fail_fast: true
since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.
I think generally in SciML it's better to use
fail_fast: true
since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.
I think that makes sense if you assume developer time has little value
What is this caching doing? Before and after?
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
4f8e26f
) 86.09% compared to head (94a7c6e
) 86.09%. Report is 2 commits behind head on master.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Bump
What is this caching doing? Before and after?
Simply put, it is similar to what DifferentialEquations.jl had but then the complexity is moved inside https://github.com/julia-actions/cache. This allowed cache
to come up with some improvements over time including the caching of the compiled
directory so that precompiled binaries can be re-used between jobs, and caching of packages
. Before this patch, DifferentialEquations.jl only cached ~/.julia/artifacts
.
I think generally in SciML it's better to use
fail_fast: true
since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.I think that makes sense if you assume developer time has little value
I said this because at Pluto.jl there was a lot of developer time wasted by this. What would happen there is that about a dozen jobs that took about 20 minutes would be cancelled as soon as one job started to fail. In practice, this was problematic because often most jobs were about 80% done with their execution when they were cancelled. By cancelling, this 80% progress was thrown away. Sometimes, this was no problem because the problem was obvious, but sometimes the problem was not obvious and having more information about which jobs did pass could be very helpful in debugging.
I said this because at Pluto.jl there was a lot of developer time wasted by this. What would happen there is that about a dozen jobs that took about 20 minutes would be cancelled as soon as one job started to fail. In practice, this was problematic because often most jobs were about 80% done with their execution when they were cancelled. By cancelling, this 80% progress was thrown away. Sometimes, this was no problem because the problem was obvious, but sometimes the problem was not obvious and having more information about which jobs did pass could be very helpful in debugging.
Looking at the Pluto tests, those tests are quite short and light. In comparison, an OrdinaryDiffEq run https://github.com/SciML/OrdinaryDiffEq.jl/pull/2092 can easily take around 10 hours of compute time, split across jobs, but that's just a lot. Withs the tens of contributors active across SciML, if we're not actively canceling jobs it can take a few hours to get an open machine to start running tests, let alone finish. So fail fast generally makes tests run hours faster because the queue is no longer clogged and it makes them start faster. We're trying to secure more money for more resources but it's not easy to come by.
Simply put, it is similar to what DifferentialEquations.jl had but then the complexity is moved inside https://github.com/julia-actions/cache. This allowed cache to come up with some improvements over time including the caching of the compiled directory so that precompiled binaries can be re-used between jobs, and caching of packages. Before this patch, DifferentialEquations.jl only cached ~/.julia/artifacts.
Does this work for the way we have the groups setup in OrdinaryDiffEq?
Is there a way to precompile once and then use that for all of the groups?
Is there a way to precompile once and then use that for all of the groups?
I don’t know
It doesn't look like this made it any faster? https://github.com/SciML/DifferentialEquations.jl/actions/runs/7332505023/job/19966734944?pr=1004#step:4:52
Is this incorporated into setup-julia now or something? What's the reason to close?
Is this incorporated into setup-julia now or something? What's the reason to close?
The close was unintentional. I was just removing old forks from GitHub. Sorry for the noise
@thazhemadam will this kind of caching be addressed in the centralization changes?
Yes, I was planning on having it available as a default, with an option to opt out.
Thanks to Ian Butterworth,
julia-actions/cache
caches~/.julia/compiled
too (https://github.com/julia-actions/cache/pull/71).Maybe interesting for the SciML ecosystem.