DecodeGenetics / Ratatosk

Hybrid error correction of long reads using colored de Bruijn graphs
BSD 2-Clause "Simplified" License
96 stars 7 forks source link

Reference-guided SLURM script errors #22

Closed ASLeonard closed 3 years ago

ASLeonard commented 3 years ago

Hi, I am hoping to use the reference-guided mode of Ratatosk to correct ONT reads. I tried using the new version, but there seem to be some errors.

In line 78 of the SLURM script, there is a missing fi causing an EOF error.

Also in line 226, the BIN_SZ variable appears undefined and causes an error in the for pos in $(seq 0 $((${BIN_SZ})) $((${chr_len}))) loop as it defaults to 0. I found buffer_sz = 1000000 in segmentBAM.py, but it isn't clear if this is related.

Thanks, Alex

GuillaumeHolley commented 3 years ago

Hi @ASLeonard,

Thanks for letting me know! I pushed a quick fix, can you let me know if this works for you?

Guillaume

ASLeonard commented 3 years ago

Looks like it is working so far.

As a possibly related question, I am modifying the script as I am on an LSF cluster, not slurm. It looks like the sbatch bin.sh submission on L252 is a single submission of many many Ratatosk commands. I don't see any slurm array code, so is this meant to be a single job, or can each Ratatosk job be submitted individually?

ASLeonard commented 3 years ago

Also there is a missing .fa in either L179 NAME_SR_UNMAPPED_IN_FILE="${PREFIX_PATH_SEG}/sample_sr_unmapped" or in L236 -u ${NAME_SR_UNMAPPED_IN_FILE}

GuillaumeHolley commented 3 years ago

Hi @ASLeonard,

Thank you for notifying me on the file extension, it has been fixed. To answer your previous question about the sbatch bin.sh, the intention was to have each Ratatosk correction job submitted independently with SLURM, which is not what I have achieved here. I will work on a fix for that.

GuillaumeHolley commented 3 years ago

I pushed a quick fix for it which changes the Ratatosk bin corrections to independent job submissions in a job array. Note that an additional parameter -n must be now provided to the script and refers to the number of nodes that can be used at a given time. The parameter for the SLURM partition name to use has been changed to -m.

ASLeonard commented 3 years ago

Everything seems to be working otherwise, so thanks for updating the scripts so quickly.

I'm actually on an LSF cluster, so I ported the slurm script to snakemake. Some parts still aren't working as smoothly as desired (similar issues to reported snakemake bugs), but I'd be happy to share that when it works more smoothly.