bioinfologics / w2rap

WGS (Wheat) Robust Assembly Pipeline
22 stars 8 forks source link

lmp_processing Error #10

Closed AllisonStander closed 2 years ago

AllisonStander commented 5 years ago

Good day Jon,

I am trying to look into my 8kb mate-pair library, but while running the lmp_processing script, I get the following:

In the error message I get:

terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
./run.sh: line 6: 6888 Aborted dedup_fastq -i MP_8000_100_S2L2.extendedFrags.fastq -o MP_8000_100_S2L2.extendedFrags.dedup > MP_8000_100_S2L2_dedup.log
cat: MP_8000_100_S2L2.extendedFrags.dedup.fastq: No such file or directory
cat: MP_8000_100_S2L2.extendedFrags.dedup_rc.fastq: No such file or directory
Traceback (most recent call last):
File "/apps/chpc/bio/w2rap/scripts/lmp_processing", line 219, in <module>
print get_flash_stats(lib["prefix"]) 
File "/apps/chpc/bio/w2rap/scripts/lmp_processing", line 76, in get_flash_stats
return "{0}:\n{1}\n{2}\n{3}\n{4}\n\nAfter de-duplication: {5}\n".format(lib_prefix, f_fields[1], f_fields[2], f_fields[3], f_fields[4], lines[5])
IndexError: list index out of range

In the output message:

#### w2rap LMP processing ####

FLASH found: /apps/chpc/bio/w2rap/bin/flash
dedup_fastq found: /apps/chpc/bio/w2rap/bin/dedup_fastq
Nextclip found: /apps/chpc/bio/w2rap/bin/nextclip
Number of libraries to process: 1
/home/astander2/lustre/00_GenomeRooibos/01_Data/02_RawData/MP_8000_100_S2L2_R1.fastq /home/astander2/lustre/00_GenomeRooibos/01_Data/02_RawData/MP_8000_100_S2L2_R2.fastq

Running FLASh and de-duplicating combined reads...

The log file in FLASH:

[FLASH] Starting FLASH v1.2.11
[FLASH] Fast Length Adjustment of SHort reads
[FLASH] 
[FLASH] Input files:
[FLASH] /home/astander2/lustre/00_GenomeRooibos/01_Data/02_RawData/MP_8000_100_S2L2_R1.fastq
[FLASH] /home/astander2/lustre/00_GenomeRooibos/01_Data/02_RawData/MP_8000_100_S2L2_R2.fastq
[FLASH] 
[FLASH] Output files:
[FLASH] ./MP_8000_100_S2L2.extendedFrags.fastq
[FLASH] ./MP_8000_100_S2L2.notCombined_1.fastq
[FLASH] ./MP_8000_100_S2L2.notCombined_2.fastq
[FLASH] ./MP_8000_100_S2L2.hist
[FLASH] ./MP_8000_100_S2L2.histogram
[FLASH] 
[FLASH] Parameters:
[FLASH] Min overlap: 10
[FLASH] Max overlap: 100
[FLASH] Max mismatch density: 0.250000
[FLASH] Allow "outie" pairs: false
[FLASH] Cap mismatch quals: false
[FLASH] Combiner threads: 32
[FLASH] Input format: FASTQ, phred_offset=33
[FLASH] Output format: FASTQ, phred_offset=33
[FLASH] 
[FLASH] Starting reader and writer threads
[FLASH] Starting 32 combiner threads
[FLASH] Processed 25000 read pairs
[FLASH] Processed 50000 read pairs

....
[FLASH] Processed 68900000 read pairs
[FLASH] Processed 68914397 read pairs
[FLASH] 
[FLASH] Read combination statistics:
[FLASH] Total pairs: 68914397
[FLASH] Combined pairs: 7515731
[FLASH] Uncombined pairs: 61398666
[FLASH] Percent combined: 10.91%
[FLASH] 
[FLASH] Writing histogram files.
[FLASH] 
[FLASH] FLASH v1.2.11 complete!
[FLASH] 130.723 seconds elapsed

Do you have any idea what is causing this?

My jobscript:

#!/bin/bash
#PBS -l select=1:ncpus=32:mpiprocs=32:mem=300GB
#PBS -l walltime=48:00:00
#PBS -q bigmem
#PBS -W group_list=bigmemq
#PBS -P CBBI1133
#PBS -o /home/astander2/lustre/00_GenomeRooibos/00_SubsetRuns/07_w2rap/01_MP8_SameLength/lmp_processing.out
#PBS -e /home/astander2/lustre/00_GenomeRooibos/00_SubsetRuns/07_w2rap/01_MP8_SameLength/lmp_processing.err
#PBS -N lmp_processing
#PBS -M 3859586@myuwc.ac.za

module load gcc/6.1.0
module load chpc/python/3.5.2_gcc-6.2.0

cd /home/astander2/lustre/00_GenomeRooibos/00_SubsetRuns/07_w2rap/01_MP8_SameLength

export PATH=/apps/chpc/bio/w2rap/bin:$PATH
/apps/chpc/bio/w2rap/scripts/lmp_processing ListOfReads 32

Kind regards, Allison

jonwright99 commented 5 years ago

Hi Allison,

Is there a *_dedup.log file in the flash directory? It's dedup_fastq that's failing for some reason.

A completed log file would look like;

Reading FASTQ...
completed in 7.20664 secs
10927600 reads to process.
De-duplicating reads...
completed in 294.363 secs
10682360 (97.76%) reads remaining.
DONE.

Jon

AllisonStander commented 5 years ago

Hi Jon,

Yes there is, but it is empty.

Allison

jonwright99 commented 5 years ago

Is the output file from FLASH in that directory, MP_8000_100_S2L2.extendedFrags.fastq?

AllisonStander commented 5 years ago

output in flash directory:

MP_8000_100_S2L2_dedup.log
MP_8000_100_S2L2_extended_R1.fastq
MP_8000_100_S2L2_extended_R2.fastq
MP_8000_100_S2L2_flash.log
run.sh

run.sh:

#!/usr/bin/env bash
flash -t 32 -M 100 -o MP_8000_100_S2L2 /home/astander2/lustre/00_GenomeRooibos/01_Data/02_RawData/MP_8000_100_S2L2_R1.fastq /home/astander2/lustre/00_GenomeRooibos/01_Data/02_RawData/MP_8000_100_S2L2_R2.fastq > MP_8000_100_S2L2_flash.log &
wait

dedup_fastq -i MP_8000_100_S2L2.extendedFrags.fastq -o MP_8000_100_S2L2.extendedFrags.dedup > MP_8000_100_S2L2_dedup.log &
wait

cat MP_8000_100_S2L2.notCombined_1.fastq MP_8000_100_S2L2.extendedFrags.dedup.fastq > MP_8000_100_S2L2_extended_R1.fastq
cat MP_8000_100_S2L2.notCombined_2.fastq MP_8000_100_S2L2.extendedFrags.dedup_rc.fastq > MP_8000_100_S2L2_extended_R2.fastq
wait

rm MP_8000_100_S2L2.*
wait
jonwright99 commented 5 years ago

It looks like the file generated by flash (MP_8000_100_S2L2.extendedFrags.fastq) isn't there for input to dedup_fastq but it's difficult to know as all the intermediate files get deleted at the end of run.sh. Can you run the flash command in run.sh on it's own and make sure MP_8000_100_S2L2.extendedFrags.fastq is generated? I might need to modify the script to cope with the case that no reads are flashed.

AllisonStander commented 5 years ago

Hi Jon,

So sorry about this. Apparently there is something wrong with the gcc compiler on the cluster I work on. Which is why I am getting the error.