Closed PhDyellow closed 1 year ago
Assigning extra resources for one target is not trivial in clustermq.
I can either run the entire pipeline with 100GB per node (wasteful) or I can run the pipeline in stages, with some stages pulling in more memory.
See https://github.com/ropensci/targets/issues/198#issuecomment-712333764 for an example of re-running the pipeline with job specific resouces.
I can use tidy select statements, so perhaps I can do something like this:
A different approach is to just list the names of targets in the submission script. It needs to be maintained in the submission script, but that is a good place to do it, given that worker resource requirements are a function of the machine we run on, not the pipeline itself.
xx <-tar_meta(targets_only = T)
xx[xx$type %in% c("stem", "pattern"),"name"]$name
Useful for getting all the names in the pipeline, assuming the _targets folder is up to date.
Addressed by f6deec2, but not tested yet.
Closing
The issue of some targets needing different resources is handled in a few ways:
Batching is now implemented in key targets that use very large amounts of memory.
The amount of resources needed by each target varies.
The minimum memory I can request on Getafix is 16GB, any less and the scheduler puts the job onto a node that does not have singularity. I have been rounding up to 20GB just to be sure.
Time
All targets share the same time pool, including targets that can be skipped because they have already finished. No target takes more than 24-48 hours, even on my slow laptop, so setting the walltime to something around 3 days or greater is fine. If I know how many targets should run, I can estimate this better to get through the scheduler faster, but asking for too much time is generally not a big issue. I have this set to 7 days, and don't see a need to change it until I know exactly how long things should take.
CPUs
Some targets contain code that can make use of multiple CPUs. For now, debugging parallel processing within parallel processing is not worth the speedups I could gain in a few places, especially because the top level parallelisation already provides a lot of speed up potential, 20-30 times faster or more depending on how many surveys are being included.
A list of places that can use parallel processing, if I come back to it later (EDIT to keep up to date):
Memory
20GB is the minimum I can request per target, but some targets have crashed with 20GB.
A list of targets that need more than 20GB, and roughly how much they need: