Microbial-Ecology-Group / AMRplusplus

AMR++ is a bioinformatic pipeline meant to aid in the analysis of raw sequencing reads to characterize the profile of antimicrobial resistance genes, or resistome.
https://www.meglab.org/
GNU General Public License v3.0
25 stars 8 forks source link

Fail to open file 'null' when running rm_host using slurm.sh #37

Closed ReedGigabyte closed 1 month ago

ReedGigabyte commented 1 month ago

I'm trying to run the rm_host part of the pipeline with the run_AMR++_slurm.sh script. I am getting the error seen below.

I'm not sure what the first command error is, but I think I understand the second command error.

My current understanding is that in AMRplusplus/modules/Alignment/bwa.nf the proccess bwa_rm_contaminant_fq takes a tuple as input, but for whatever reason, there's only one file in the tuple.

In AMRplusplus/subworkflows/fastq_host_removal.nf bwa_rm_contaminant_fq takes reference_index_files and read_pairs_ch as input.

In AMRplusplus/main_AMR++.nf, the workflow FASTQ_RM_HOST_WF gets params.host and fastq_files as inputs.

My theory is that there's an issue with fastq_files that is causing problems downstream.

Where does fastq_files get its files? How do I make ${reads[1]} in bwa_rm_contaminant_fq not null?

I've tried replacing 1 with * in my params.config file to change the input reads in case that would solve it and it did not, as the pipeline treated them as two separate tasks rather than two files for the same task.

In addition, what does the first command error mean?

Error:

ERROR ~ Error executing process > 'FASTQ_RM_HOST_WF:bwa_rm_contaminant_fq (SRR15123516_1)'

Caused by: Process FASTQ_RM_HOST_WF:bwa_rm_contaminant_fq (SRR15123516_1) terminated with an error exit status (1)

Command executed:

bwa mem genome.fa SRR15123516_1.fastq null -t 8 > SRR15123516_1.host.sam samtools view -bS SRR15123516_1.host.sam | samtools sort -@ 8 -o SRR15123516_1.host.sorted.bam rm SRR15123516_1.host.sam samtools index SRR15123516_1.host.sorted.bam && samtools idxstats SRR15123516_1.host.sorted.bam > SRR15123516_1.samtools.idxstats samtools view -h -f 12 -b SRR15123516_1.host.sorted.bam -o SRR15123516_1.host.sorted.removed.bam samtools sort -n -@ 8 SRR15123516_1.host.sorted.removed.bam -o SRR15123516_1.host.resorted.removed.bam samtools fastq -@ 8 -c 6 SRR15123516_1.host.resorted.removed.bam -1 SRR15123516_1.non.host.R1.fastq.gz -2 SRR15123516_1.non.host.R2.fastq.gz -0 /dev/null -s /dev/null -n

rm *.bam

Command exit status: 1

Command output: (empty)

Command error: [M::bwa_idx_load_from_disk] read 0 ALT contigs [E::main_mem] fail to open file `null'.

ReedGigabyte commented 1 month ago

I figured out the issue. In my params.config file, the reads address was pointing to only one of the .fastq files because I wasn't using regex. I changed 1.fastq to {1,2}.fastq and that fixed it