Open shellywanamaker opened 8 hours ago
We can also split read files, then merge alignments.
my config file that I got from Carson Miller at UWpeds has a parameter that specifies using ckpt
for the first attempt at running a task. if that fails, it retries using our node.
process {
executor = 'slurm'
queue = { task.attempt == 1 ? 'ckpt' : 'cpu-g2-mem2x' }
maxRetries = 1
clusterOptions = { "-A srlab" }
scratch = '/gscratch/scrubbed/srlab/'
}
The methylseq pipeline I'm currently running shows for the sample EF07-EM01-Zygote_2 it first attempted bismark align on ckpt
And when this failed because of the time limit, it re-attempted on our node
but this is currently stalled because the time specified conflicts with the scheduled maintenance
I restarted the pipeline and I updated my config file to prevent individual tasks from running more than 3 days.
process {
executor = 'slurm'
queue = { task.attempt == 1 ? 'ckpt' : 'cpu-g2-mem2x' }
maxRetries = 1
clusterOptions = { "-A srlab" }
scratch = '/gscratch/scrubbed/srlab/'
resourceLimits = [
cpus: 16,
memory: '150.GB',
time: '72.h'
]
}
Adding this thread from the nf-core slack direct message I have going with Carson Miller
Hi Carson, I have a question about the config file and using the ckpt resource on hyak. I read here https://hyak.uw.edu/docs/compute/checkpoint/ that jobs are stopped and requeued every 4-5 hours and I'm wondering if I have a sample that takes longer than 5 hours to process if it will be able to requeue from where it left off in the pipeline (for instance if it was halfway through aligning reads)? Or if it restarts the alignment from the beginning, in which case would it end up in a loop and never be able to finish aligning given the 4-5 hour time constraint of ckpt?
Carson Miller Hi Shelly, unfortunately the way that Nextflow caches jobs means that the job will have to be completely restarted. The way I have handled this is using dynamic resource requests, I’m not sure if my config has this, but basically the idea is that on attempt 1 I submit to the ckpt queue, then if it fails I’ll resubmit it to another queue (ie compute) with an increased time/memory request. I can send you an example if this doesn’t make sense
Shelly Wanamaker ok I get that and i think your config file does do that. i modified it to use the resources I have access to and I think it's this part:
process {
executor = 'slurm'
queue = { task.attempt == 1 ? 'ckpt' : 'cpu-g2-mem2x' }
maxRetries = 1
clusterOptions = { "-A srlab" }
scratch = '/gscratch/scrubbed/srlab/'
}
Carson Miller That looks great to me!
Shelly Wanamaker Looking at the .command.run file for a task that failed I can see it tried the ckpt resource and second attempt tried the cpu-g2-mem2x resource but these won't run because the time specified conflicts with the scheduled maintenance. can i modify that parameter in the config file?
Carson Miller Yes, you should be able to modify the resources requested by a specific module in the conf/modules.config file withName:
ASSEMBLYANNOTATE {
array = 100
cpus = { 2 }
memory = { 7.GB * task.attempt }
time = { 4.h * task.attempt }
}
And you can set a max like this in your nextflow.config or modules.config so that you can make sure the job request doesn't conflict with the scheduled maintenance
params {
resourceLimits = [
cpus: 16,
memory: '200.GB',
time: '72.h'
]
}
Shelly Wanamaker i do have this in my nextflow.config file (copied from yours)
params {
config_profile_description = 'UW Hyak Roberts labs cluster profile provided by nf-core/configs.'
config_profile_contact = 'Shelly A. Wanamaker @shellywanamaker'
config_profile_url = 'https://faculty.washington.edu/sr320/'
max_memory = 742.GB
max_cpus = 40
max_time = 72.h
}
but it seems like i need the resourceLimits parameter
Carson Miller Yeah, there has been a recent shift away from max_memory and those other parameters in Nextflow/nf-core pipelines (edited)
Shelly Wanamaker I just added the following modification to my nextflow.config file
params {
config_profile_description = 'UW Hyak Roberts labs cluster profile provided by nf-core/configs.'
config_profile_contact = 'Shelly A. Wanamaker @shellywanamaker'
config_profile_url = 'https://faculty.washington.edu/sr320/'
resourceLimits = [
cpus: 16,
memory: '150.GB',
time: '72.h'
]
}
and tried resuming my pipeline but got an invalid input values warning
Carson Miller Try running nextflow self-update
Shelly Wanamaker interesting, it updated and is now running nextflow 24.10.2 but still throwing the same warning
Carson Miller My mistake, this should be in the process section and not the params section. Sorry for the confusion! https://www.nextflow.io/docs/latest/reference/process.html#resourcelimits
Shelly Wanamaker oh that makes sense! thank you so much for your help with this!
Carson Miller Not a problem! Hopefully this will allow the pipeline to work correctly for you!
Shelly Wanamaker yes! no more warning
Cool! Thanks for all of this!!!
GPU options?