bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
394 stars 190 forks source link

bmtagger remove host error #298

Open neptuneyt opened 4 years ago

neptuneyt commented 4 years ago

Hi, I trimmd host plant sequence via kneaddata, and then run below command to trim human sequence,

nohup time metawrap read_qc -1 STA-032A_1.fastq -2 STA-032A_2.fastq -t 20 -o ./test/STA-032A --skip-pre-qc-report &> ./test/STA-032A_read_qc.log &

and return below error information:

>>>>> Now validing the length of the 2 paired-end infiles: STA-032A_1_trimmed.fq and STA-032A_2_trimmed.fq <<<<<
Writing validated paired-end read 1 reads to STA-032A_1_val_1.fq
Writing validated paired-end read 2 reads to STA-032A_2_val_2.fq

Total number of sequences analysed: 16645775

Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 12550 (0.08%)

Deleting both intermediate output files STA-032A_1_trimmed.fq and STA-032A_2_trimmed.fq

====================================================================================================

-----------------------------------------------------------------------------------------------------------------------
-
-----                 Trimmed reads saved to: ../../Metawrap_QC_and_trim_human//trimmed_1.fastq and                ----
-
-----                              ../../Metawrap_QC_and_trim_human//trimmed_2.fastq     
------------------------------------------------------------------------------------------------------------------------
-----             running bmtagger with /mnt/data/share/database/Human/HG38/BMTAGGER_INDEX/hg38.bitmask            -----
-----                  /mnt/data/share/database/Human/HG38/BMTAGGER_INDEX/hg38.srprism indexes...                  -----
------------------------------------------------------------------------------------------------------------------------

Info: no ./bmtagger.conf found
Using following programs:
/mnt/data/share/software/Miniconda3/envs/metawrap/bin/bmfilter
/mnt/data/share/software/Miniconda3/envs/metawrap/bin/srprism
/mnt/data/share/software/ncbi-blast-2.9.0+-src/c++/ReleaseMT/bin/blastn
/mnt/data/share/software/Miniconda3/envs/metawrap/bin/extract_fullseq
MAIN SCRIPT IS /mnt/data/share/software/Miniconda3/envs/metawrap/bin/bmtagger.sh (PID=22610)
RUNNING bmfilter
* Attaching /mnt/data/share/database/Human/HG38/BMTAGGER_INDEX/hg38.bitmask (pattern = 0b111111111111111111 of len 18 using 36 bits)
* Notice: creating CFastqFileReader( 1, 0, ../../Metawrap_QC_and_trim_human//trimmed_1.fastq, ../../Metawrap_QC_and_trim_human//trimmed_2.fastq)
terminate called after throwing an instance of 'std::runtime_error'
  what():  cshortreader.cpp:194 runtime_error   Error: bad read id in file ../../Metawrap_QC_and_trim_human//trimmed_2.fastq: expected E00603:99:HHWJLCCXY:1:2205:13220:7831:N:0#0/1, got E00603:99:HHWJLCCXY:2:1114:7101:4649:N:0#0/2
/mnt/data/share/software/Miniconda3/envs/metawrap/bin/bmtagger.sh: 行 179: 22618 已放弃               $BMFILTER $accession $input1 $input2 -q $quality $bmfiles $spotId_only -o "$TMPDIR"/bmtagger.$tmpstr $bmoptions

It DID WORK when I try to use RAW fastq which did not process by kneaddata, the problem seems to derive from this step, there are demo fastq file which processed by kneaddata: " STA-032A_1.fastq "

@E00603:99:HHWJLCCXY:1:2205:13220:7831:N:0#0/1
GCCGGGATTCTGGCCCACGTGGCGGATGAAGAACTCCAGCGCAGCACTCTCCGGGGAGGAGACGCCCAGCTGGGGGATGGTCCAGCGCAGGGAGGTGGCATTGAGCATGGTGGCGCTGCCGGCGGAGGGCGTCAGGATGCTGGTGATGAC
+
``eeeiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiieiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiieiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiVeiiiieiiiii`iiiiii
@E00603:99:HHWJLCCXY:2:1210:7923:34483:N:0#0/1
TTTATTATGTAAGTAGGAGGTGTTTATATGGTATCTCAAAAAATTAAACAAATAATGAAAATGAAAAAAATTACAAATATTCAAGTTGCTGAACATCTAGGAACTTCACCACAAGCACTAGCTAACAAGTTTTCCAGAGAAACGCTTTCT
+
``eeeeiiiiiiieeiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiieiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiieiiiiiiiiiiiiiiii`iii
@E00603:99:HHWJLCCXY:2:1207:18781:46894:N:0#0/1
GCCATCGTCCACGCCTATCCGGACGAGCTGGGGAAAATCGTGATGAAGCTACGCATTCTGGAAGGCGGCGTGCTGGAGCTGACGGTGCGGGACTGGGGCCGGGGCATCGACGACATCCGCCAGGCCCGGCAGCCGCTGTTCACCACCGGC

"STA-032A_2.fastq"

@E00603:99:HHWJLCCXY:2:1114:7101:4649:N:0#0/2
CTTACGATGTTAGCGGTTGCGGCACTCTTCGCGACCGGGTGTGTAAACGAAGAGCCCCCGCACAAGGAGGATCCGAACCCCGAGCCCGCCGGGACGACCGGCTATCTGTCGATCGGCAACCTCTCGATGACAGTCGTCTACGACGAAACC
+
````eiL[[V`i``i`ii[`iieiieiiiiiiiiiiiii`ei`ieeiii`Le[LiiieeiLVe[LV[iiiiiiieL`i`eeiiiiiiL`eV`[eiiiii[```[[ii[iiiieeL[`eeiiiiie[iie``eiL`i``eei[`ee`i[ei
@E00603:99:HHWJLCCXY:2:1213:20466:39721:N:0#0/2
AAATTGAAGCGCTTGGACAGTCATTTGGACTTGAAATTGAACCGGAGAAAAAGGTCTACGACATGTCGGTCAGCGAGAAGCAGACGGTCGAGATCCTGAAGGTATTATATAGAGGCGCCCAGATCCTGATTCTGGACGAGCCAACGGCAG
+
`[L`eiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiieiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii[ieiiii
@E00603:99:HHWJLCCXY:1:2108:2290:33269:N:0#0/2
GTAGTAGCTGATTTCCGGCTCAAACGGGAGCCGCTCGCCATCATCGTTGTACTCGACTTGACCGACGGCAATGTCATCTTTGATGGAGGTCTGGGCGTATGCCCAGTGGTTGTCCAGCCTTTCGGCCAGACGCTTGAGGGAAGCGCGAAT

Who anyone faces this problem, and how can I handle it? Thanks a lot.

ursky commented 4 years ago

Honestly, no idea - i have never used kneaddata. If you absolutely cannot get it to work, another way to handle this is to assemble all the reads without filtering out the host, and then remove the host sequences from the assembly itself by using any alignment software (even just blastn should work). Or you could just leave the host sequences in the data and proceed all the way - the extra filtering rarely makes a huge difference on the quality besides it taking longer to process, since if the host is eukaryotic its sequences will be easily clustered into their own bin during the binning, or left in the "unbinned" pile. Good luck!

neptuneyt commented 4 years ago

This problem seems to be due to the Bmfilter can not process unpaired and not sorted fastq file processed by kneaddata.

wurunbiao162 commented 1 year ago

----- running bmtagger with ----- ----- /public1/home/db/BMTAGGER_INDEX/GCF_009193385.2_Nvit_psr_1.1_genomic.bitmask ----- ----- /public1/home/db/BMTAGGER_INDEX/GCF_009193385.2_Nvit_psr_1.1_genomic.srprism ----- ----- indexes... -----

Info: no ./bmtagger.conf found Using following programs: /public1/home/mambaforge-pypy3/envs/metawrap-env/bin/bmfilter /public1/home/mambaforge-pypy3/envs/metawrap-env/bin/srprism /public1/home/mambaforge-pypy3/envs/metawrap-env/bin/blastn /public1/home/mambaforge-pypy3/envs/metawrap-env/bin/extract_fullseq MAIN SCRIPT IS /public1/home/mambaforge-pypy3/envs/metawrap-env/bin/bmtagger.sh (PID=453912) RUNNING bmfilter

real 1m18.859s user 0m0.526s sys 0m22.310s FAILED: bmfilter with rc=134


Something went wrong with running Bmtagger! Exiting.


What should I do next? thank you.