arq5x / lumpy-sv

lumpy: a general probabilistic framework for structural variant discovery
MIT License
309 stars 118 forks source link

Question of `min_non_overlap` option meaning #287

Open Yiming-Shen opened 5 years ago

Yiming-Shen commented 5 years ago

Hi Ryan,

I am recently working on calling gene fusions by paired-end reads using lumpy. I read that min_non_overlap options are set to ignore reads that overlap their pairs upto a threshold and it is suggested to be read length (to ignore all reads that overlap with their pairs). I have reads of length 100 and it surprised me to see lumpy produces more SVs when min_non_overlap is 150 than 100. If I understand it correctly, setting min_non_overlap as 150 should have the same effect as 100 when input read length is 100. Do you have any idea how this difference is generated? Or do I just misunderstand this option.

ryanlayer commented 5 years ago

I do not suggest making min_non_overlap greater than read length. I am not sure what happens, but I am guessing some values that we assume are positive become negative and then strange thing happen.

On Feb 2, 2019, at 2:20 AM, YSwithoutL notifications@github.com wrote:

Hi Ryan,

I am recently working on calling gene fusions by paired-end reads using lumpy. I read that min_non_overlap options are set to ignore reads that overlap their pairs upto a threshold and it is suggested to be read length (to ignore all reads that overlap with their pairs). I have reads of length 100 and it surprised me to see lumpy produces more SVs when min_non_overlap is 150 than 100. If I understand it correctly, setting min_non_overlap as 150 should have the same effect as 100 when input read length is 100. Do you have any idea how this difference is generated? Or do I just misunderstand this option.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.