Slurm job array interface

AnderGray commented 5 months ago

Adds the functionality discussed in #153

We can now pass a SlurmInterface type to external model, and any call to execute! will be submitted as a job array.

struct SlurmInterface
    name::String
    account::String
    partition::String
    nodes::Integer
    ntasks::Integer
    batchsize::Integer
    extras::Vector{String}
    time::String
end

Julia waits until job array competes. In a slurm environment, allows you also to run parallel sampling in parallel but with Julia running on a single node (e.g. from the login node). Submitted external model runs can now be as heavy as you like.

Also added

slurm-opensees.jl example
slurm-openmc.jl example

Working, but some things to do:

More constructors
Docstrings
Variable names check
Testing?

FriesischScott commented 5 months ago

For testing, some of the code could be extracted into internal functions. For example, one that writes the slurm script. This can then be tested, independently from actually running it.

codecov[bot] commented 5 months ago

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (49298ac) 98.27% compared to head (9610f10) 97.96%.

Files	Patch %	Lines
src/hpc/slurm.jl	92.53%	5 Missing :warning:
src/models/externalmodel.jl	97.56%	1 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #155 +/- ## ========================================== - Coverage 98.27% 97.96% -0.31% ========================================== Files 27 28 +1 Lines 1042 1133 +91 ========================================== + Hits 1024 1110 +86 - Misses 18 23 +5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

AnderGray commented 4 months ago

I think ready for another look.

As discussed offline, the SlurmInterface is tested only on Linux, by importing and overwriting the run_HPC_job function, which calls a dummy script which loops over the requested samples.

New folder tests/test_utilities contains added helper functions for these tests. ExternalModel is also tested for sample interpolation into source files.

We discussed that aliasing the slurm command in the environment was a better solution, however we can't seem to alias commands using Julia's shell commands. See here, and here for discussion. We could perhaps pre-set this using GitHubs CI, and only test slurm on GitHub?

Also the gaussian copula test seems to fail sometimes. We should perhaps consider migrating to Copulas.jl at some point, although there are missing features (such as Rossenblatt transform).

FriesischScott / UncertaintyQuantification.jl

Slurm job array interface #155

Codecov Report