SciML / DifferentialEquations.jl

Multi-language suite for high-performance solvers of differential equations and scientific machine learning (SciML) components. Ordinary differential equations (ODEs), stochastic differential equations (SDEs), delay differential equations (DDEs), differential-algebraic equations (DAEs), and more in Julia.
https://docs.sciml.ai/DiffEqDocs/stable/
Other
2.85k stars 226 forks source link

Use `julia-actions/cache` in CI #1004

Closed rikhuijzer closed 5 months ago

rikhuijzer commented 10 months ago

Thanks to Ian Butterworth, julia-actions/cache caches ~/.julia/compiled too (https://github.com/julia-actions/cache/pull/71).

Maybe interesting for the SciML ecosystem.

rikhuijzer commented 10 months ago

I've set fail_fast: false in 3aec1fb so that GitHub doesn't automatically cancel all runs if one fails.

devmotion commented 10 months ago

I think generally in SciML it's better to use fail_fast: true since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.

rikhuijzer commented 10 months ago

I think generally in SciML it's better to use fail_fast: true since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.

I think that makes sense if you assume developer time has little value

ChrisRackauckas commented 10 months ago

What is this caching doing? Before and after?

codecov[bot] commented 10 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (4f8e26f) 86.09% compared to head (94a7c6e) 86.09%. Report is 2 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1004 +/- ## ======================================= Coverage 86.09% 86.09% ======================================= Files 11 11 Lines 151 151 ======================================= Hits 130 130 Misses 21 21 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ChrisRackauckas commented 9 months ago

Bump

rikhuijzer commented 9 months ago

What is this caching doing? Before and after?

Simply put, it is similar to what DifferentialEquations.jl had but then the complexity is moved inside https://github.com/julia-actions/cache. This allowed cache to come up with some improvements over time including the caching of the compiled directory so that precompiled binaries can be re-used between jobs, and caching of packages. Before this patch, DifferentialEquations.jl only cached ~/.julia/artifacts.

rikhuijzer commented 9 months ago

I think generally in SciML it's better to use fail_fast: true since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.

I think that makes sense if you assume developer time has little value

I said this because at Pluto.jl there was a lot of developer time wasted by this. What would happen there is that about a dozen jobs that took about 20 minutes would be cancelled as soon as one job started to fail. In practice, this was problematic because often most jobs were about 80% done with their execution when they were cancelled. By cancelling, this 80% progress was thrown away. Sometimes, this was no problem because the problem was obvious, but sometimes the problem was not obvious and having more information about which jobs did pass could be very helpful in debugging.

ChrisRackauckas commented 9 months ago

I said this because at Pluto.jl there was a lot of developer time wasted by this. What would happen there is that about a dozen jobs that took about 20 minutes would be cancelled as soon as one job started to fail. In practice, this was problematic because often most jobs were about 80% done with their execution when they were cancelled. By cancelling, this 80% progress was thrown away. Sometimes, this was no problem because the problem was obvious, but sometimes the problem was not obvious and having more information about which jobs did pass could be very helpful in debugging.

Looking at the Pluto tests, those tests are quite short and light. In comparison, an OrdinaryDiffEq run https://github.com/SciML/OrdinaryDiffEq.jl/pull/2092 can easily take around 10 hours of compute time, split across jobs, but that's just a lot. Withs the tens of contributors active across SciML, if we're not actively canceling jobs it can take a few hours to get an open machine to start running tests, let alone finish. So fail fast generally makes tests run hours faster because the queue is no longer clogged and it makes them start faster. We're trying to secure more money for more resources but it's not easy to come by.

Simply put, it is similar to what DifferentialEquations.jl had but then the complexity is moved inside https://github.com/julia-actions/cache. This allowed cache to come up with some improvements over time including the caching of the compiled directory so that precompiled binaries can be re-used between jobs, and caching of packages. Before this patch, DifferentialEquations.jl only cached ~/.julia/artifacts.

Does this work for the way we have the groups setup in OrdinaryDiffEq?

ChrisRackauckas commented 9 months ago

Is there a way to precompile once and then use that for all of the groups?

rikhuijzer commented 9 months ago

Is there a way to precompile once and then use that for all of the groups?

I don’t know

ChrisRackauckas commented 9 months ago

It doesn't look like this made it any faster? https://github.com/SciML/DifferentialEquations.jl/actions/runs/7332505023/job/19966734944?pr=1004#step:4:52

ChrisRackauckas commented 5 months ago

Is this incorporated into setup-julia now or something? What's the reason to close?

rikhuijzer commented 5 months ago

Is this incorporated into setup-julia now or something? What's the reason to close?

The close was unintentional. I was just removing old forks from GitHub. Sorry for the noise

ChrisRackauckas commented 5 months ago

@thazhemadam will this kind of caching be addressed in the centralization changes?

thazhemadam commented 5 months ago

Yes, I was planning on having it available as a default, with an option to opt out.