CMU-SAFARI / BLEND

BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applications: 1) finding overlapping reads and 2) read mapping. Described by Firtina et al. (published in NARGAB https://doi.org/10.1093/nargab/lqad004)
Other
42 stars 4 forks source link

undesired 'mm_map_frag rechain' in sam file #4

Closed TDDB-limagrain closed 1 year ago

TDDB-limagrain commented 1 year ago

Dear Blend development team, I was interesting in testing BLEND for short read mapping. The mapping of paired-end Illumina reads against a tomato genome work perfectly but the output sam file contained 252 lines with "mm_map_frag rechain" after the PG line:

@SQ     SN:17-PSC-SL_TK14181.1.0_Chr11  LN:53848686
@SQ     SN:17-PSC-SL_TK14181.1.0_Chr12  LN:68218429
@RG     ID:var1    SM:var1     LB:Solution     PL:illumina     PU:none
@PG     ID:blend        PN:blend        VN:1.0  CL:blend -ax sr -t 4 -R @RG\tID:var1\tSM:var1\tLB:Solution\tPL:illumina\tPU:none slycopersicum.fasta.ind reads_1.fastq.gz reads_2.fastq.gz
mm_map_frag rechain
mm_map_frag rechain
...

These lines seem to be problematic for further processing with samtools:

samtools flagstat tmp.sam
[W::sam_read1_sam] Parse error at line 16
samtools flagstat: error reading from "tmp.sam"

Best regards,

Thomas

canfirtina commented 1 year ago

Dear @TDDB-limagrain,

Thanks for this catch. The line producing this output was originally added for debugging purposes and was inadvertently left in the code.

I have now removed this line and fixed the issue in commit 6f19e37. Could you please pull the latest version of the code, compile, and try again?

Thanks, Can

TDDB-limagrain commented 1 year ago

Hi Can, it is fixed now! thanks a lot.

Best regards,

Thomas