HPC Episode - Githubissues

multimeric commented 1 year ago

Added an optional HPC episode, which uses Slurm as an example
~~This has been written using the current plan writing system. Once/if #19 is merged, some of this will need re-writing to remove redundancy~~

github-actions[bot] commented 1 year ago

:warning: WARNING :warning:

This pull request contains a mix of workflow files and regular files. This could be malicious. No preview will be created.

regular files:

FIXME.Rproj
config.yaml
episodes/files/packages.R
episodes/files/plans/plan_slurm.R
episodes/files/plans/plan_slurm_gpu.R
episodes/files/plans/plan_slurm_memory.R
episodes/hpc.Rmd
renv/activate.R
renv/profiles/lesson-requirements/.Rprofile
renv/profiles/lesson-requirements/renv.lock
renv/profiles/lesson-requirements/renv/.gitignore
renv/profiles/lesson-requirements/renv/activate.R
renv/profiles/lesson-requirements/renv/settings.json

workflow files:

.github/workflows/sandpaper-main.yaml

joelnitta commented 1 year ago

@multimeric thanks for your patience with #19, which has now been merged.

Is this PR basically ready to go in terms of content? There was a fair bit of discussion about it earlier, but seems to have quieted down recently (I personally don't use an HPC server so don't have much to add).

Also, in the OP you mentioned Once/if https://github.com/carpentries-incubator/targets-workshop/pull/19 is merged, some of this will need re-writing to remove redundancy. Does that still need to happen?

multimeric commented 1 year ago

Yep, the content is all sorted, it's what I presented in July. But indeed I do need to rewrite some of this now that #19 has been merged.

multimeric commented 4 months ago

Hi @joelnitta, I've merged in the main branch changes now. However I just realised that, since the sandpaper builds are mostly going to be happening on the CI server, it won't be able to use Slurm. Should I:

Don't build the HPC lesson on the CI
Include pre-computed results, and make all chunks eval: false
Add slurm to the CI server, e.g. https://github.com/koesterlab/setup-slurm-action

joelnitta commented 4 months ago

Thanks @multimeric.

1. Don't build the HPC lesson on the CI

2. Include pre-computed results, and make all chunks `eval: false`

3. Add slurm to the CI server, e.g. https://github.com/koesterlab/setup-slurm-action

My preference is 3 > 2 > 1. The lesson webpage is rendered with CI, so 1 would mean we don't have the HPC lesson on the webpage either (it would only be available when locally rendered).

joelnitta commented 4 months ago

@multimeric

I realized that I needed to update renv.lock in main, so I did that and cherry-picked the commits here (4eb5f06, 9489bea), as well as make one more update to renv.lock (I think due to using CRAN instead of the carpentries r-universe, e012aaf).

Can you check if you can still build the HPC lesson locally? I can't because I don't have access to a computer with slurm.

(sorry - I realized cherry-pick may not have been the best approach here. Feel free to roll those back and re-base instead if you want)

multimeric commented 4 months ago

I can do that, but can we also test if it's working on the CI? I'm not sure if it actually runs the lesson build for pull requests at the moment?

joelnitta commented 4 months ago

Sure - I assume you mean implementing suggestion 3?

Add slurm to the CI server, e.g. https://github.com/koesterlab/setup-slurm-action

multimeric commented 4 months ago

Oh, I already added that to .github/workflows/sandpaper-main.yaml, but I'm not sure if it will even run that workflow for a pull request. Also it doesn't like the fact that I've changed the content and the workflow in the same PR.

joelnitta commented 4 months ago

How about this: I just set up another repo and merged hpc into main over there, let's see how it goes.

GH action is running now... https://github.com/joelnitta/targets-hpc/actions/runs/9867906104

hm, seems to be hung up, I predict it will git killed by time-out. I've actually seen this behavior before, I'm not sure what's causing it but I don't think it is because of this PR...

multimeric commented 4 months ago

No, I think it is a fault in my PR. It's failing at the second workflow which uses different memory requirements, which is likely the issue. I've reduced the memory requirements, but I now realise that it's guaranteed to fail at the GPU workflow since the mini cluster isn't going to have a GPU node.

joelnitta commented 4 months ago

ah... so perhaps CI is not an option then?

multimeric commented 4 months ago

Well not for the last workflow, but that's inside a challenge solution so even if I just paste in the pre-computed result there I don't think that's too bad.

joelnitta commented 4 months ago

Sounds good, let's try that then. If you push the change here, I can try it again over on my other fork.

carpentries-incubator / targets-workshop

HPC Episode #18

:warning: WARNING :warning: