arq5x / lumpy-sv

lumpy: a general probabilistic framework for structural variant discovery
MIT License
314 stars 118 forks source link

Can't find first and/or second of pair in sam block of length 2 #389

Open jonahcullen opened 1 year ago

jonahcullen commented 1 year ago

Hello, thank you for producing and supporting this tool, it is most excellent!

I have been struggling with a problem that seems to crop up for a small proportion of all the samples I attempting to process, namely the samblaster error below

samblaster: Can't find first and/or second of pair in sam block of length 2 for id: A00351:169:HLCY7DSXX:3:1220:15591:9862

I am running lumpyexpress as

lumpyexpress -B sample.bam -o sample.lumpy.vcf

The analysis-ready, coordinate-sorted sample.bam was generated following GATK's best practices from 16 paired FASTQs. When I look at the region where the error occurred I think I understand what it is complaining about but am not entirely sure

A00351:169:HLCY7DSXX:3:1220:15591:9862  163     chr1    1452 ... RG:Z:D06257_J
A00351:169:HLCY7DSXX:3:1220:15591:9862  163     chr1    1452 ... RG:Z:D06257_B
A00351:169:HLCY7DSXX:3:2220:16215:7592  163     chr1    1452 ... RG:Z:D06257_J
A00351:169:HLCY7DSXX:3:2220:16215:7592  163     chr1    1452 ... RG:Z:D06257_B

I was hoping to avoid pulling the split and discordant reads prior to input lumpy as I've already processed a couple hundred samples without issue. Is there some other filtering I should apply to sample.bam or should I rethink/reprocess everything to be consistent?

Thanks again, Jonah.