arq5x / lumpy-sv

lumpy: a general probabilistic framework for structural variant discovery
MIT License
314 stars 118 forks source link

Will min_non_overlap influence sensitivity of lumpy? #293

Open xyw1 opened 5 years ago

xyw1 commented 5 years ago

Hi Ryan, I'm recently using lumpy on some data sets with known structural variants (3 known SVs totally) and I'm trying to optimize my parameters. I found that when I change the value of min_non_overlap (range from 0 to read length 100) , the sensitivity of lumpy varies (the blue line in the figure below). At some values of min_non_overlap(like 25,30,35,40), lumpy omitted some variants in the data sets. How does this happen? Or is that I just didn't run it correctly? image

I read in the documentation that min_non_overlap is described as following

Number of base pair positions that must be unique to each end of a read pair. Some library preps are created with large reads and small library sizes such that read overlap, in all over cases overlapping reads tends to be a sign of an error. We typically set this to read length (pairs cannot overlap).

What does 'overlapping reads tends to be a sign of an error' mean?

Thanks a lot.