Closed GACGAMA closed 11 months ago
By leveraging parallel command and cat I was able to make it work:
CSV file
a.R1.fq.gz,a.R2.fq.gz,a
b.R1.fq.gz,b.R2.fq.gz,b
cat mycsvfile.csv | parallel -j 1 --verbose --colsep ',' --link 'makeAlignmentScripts.py --output-directory /scratch4/bams/{3} --in-fastq1s /scratch4/fastq/{1} --in-fastq2s /scratch4/fastq/{2} --out-fastq1-name {3}.R1.merged.fq.gz --out-fastq2-name {3}.R2.merged.fq.gz --genome-reference /scratch4/references/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --out-bam {3}.bam --bam-header "@RG\tID:read_group_001\tPL:illumina\tLB:library_001\tSM:{3}" --container-tech singularity --threads 6 --run-trimming --split-input-fastqs --run-alignment --run-mark-duplicates --run-workflow' & wait
I had to put --bam-header in double quotes "" because I can't put single quotes inside single quotes (parallel executes comands inside single quotes). This did work, but now I'm getting another problem: everytime I use makeAlignmentScripts, it is downloading images of docker hub. Free accounts can only download 100 times each 6 hours.
So I'm getting: FATAL: Unable to handle docker://lethalfang/bwa:0.7.17_samtools uri: failed to get checksum for docker://lethalfang/bwa:0.7.17_samtools: reading manifest 0.7.17_samtools in docker.io/lethalfang/bwa: toomanyrequests:You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit INFO 2023-07-05 13:17:14,685 run_script FINISHED RUNNING /scratch4/bams/a/logs/align.2023.07.05.13.15.08.184.cmd in 30.229 seconds with an exit code of 255.
Is there anyway use local images to solve this issue?
UPDATE:
Running makeSomaticScripts.py single
with -tech singularity on only one sample with 20 threads runs into the same problem of docker pull limit really fast.
To your first post, aligned.bwa.bam is the intermediate bam file before markdup took place, so the a.bam is the designated final bam file. Other than that, the file names seem to be the ones you designated.
I don't quite know how singularity works. For docker, if an image is already downloaded, it won't download that image again, but I don't know how to cache image for singularity.
Im trying to work out how to use somaticseq in a singularity HPC. The problem is that for parallel using and when singularity is installed in a server, only the admins can download the packages permanently. This means I will always be limited to 200 pulls from docker, otherwise it works fine!
But for the first post, Im still facing the same problem. I can
t establish multiple outputs with the same script. Even tough I can use multiple inputs (sample a and sample b R1 and R2 fastqs), I can`t set multiple outputs like a.bam b.bam, this gives an error of multiple outputs where only one output is expected by the script. Is somatic seq outputting all samples to the same BAM file? Or is it rewriting the output, even tough it should give multiple final bams for multiple sample input?
Yes, when you have multiple inputs of fastq files, it is assumed that those fastq files all belong to the same samples (e.g., multiple sequencing lanes), and yes they are combined into a single bam file.
I have recently installed Somaticseq by using conda env, cloning the repo and installing with pip -e install (somaticseq V 3.7.3)
Using
makeAlignmentScripts.py --output-directory /scratch4/bams --in-fastq1s /scratch4/fastq/a.R1.fastq /scratch4/fastq/b.R1.fastq --in-fastq2s /scratch4/fastq/a.R1.fastq /scratch4/fastq/b.R2.fastq --out-fastq1-name a.R1.fq.gz --out-fastq2-name a.R2.fq.gz --genome-reference /scratch4/references/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --out-bam a.bam --bam-header '@RG\tID:read_group_001\tPL:illumina\tLB:library_001\tSM:patient_001' --container-tech singularity --threads 6 --run-trimming --split-input-fastqs --run-alignment --run-mark-duplicates --run-workflow
Produces:
a.R1.fq.gz a.R2.fq.gz a.bam aligned.bwa.bam
Even when selecting --trim-software trimmomatic and --markdup-software picard and removing --split-input-fastqs does not work to produce correctly named files!