arq5x / lumpy-sv

lumpy: a general probabilistic framework for structural variant discovery
MIT License
307 stars 119 forks source link

extremely large deletions and duplications #322

Open lee039 opened 4 years ago

lee039 commented 4 years ago

Dear,

I ran Lumpy on some non-human samples and found some extremely large deletions and duplications at almost the same positions.

The examples are:

var_id | BIN | SVLEN DEL_1 | chr1:13725489-74725721 | 61000232 DUP_1 | chr1:13725496-74725224 | 60999728 DEL_2 | chr1:51839810-101987685 | 50147875 DUP_2 | chr1:51839640-101987485 | 50147845

I ran Lumpy using smoove and duphold option. The depth change (DHFFC) does not support these large variants. Therefore, these calls might be false calls. however, do you know any known artefacts that cause this kind of behaviours?

I don't know whether it will help... but I found 95 calls (deletions and duplications, >100kb) on chr 1, and 71 of them are pairs of deletions and duplications like explained above.

Thanks a lot for comments in advance! :)

Lim

ryanlayer commented 4 years ago

It really depends on your experiment, but I typically filter SVs that are larger than 1MB.

Try visualizing your SVs with samplot. For big SVs use the —zoom 500 option.

https://github.com/ryanlayer/samplot

On Dec 4, 2019, at 2:25 AM, Limitto notifications@github.com wrote:

 Dear,

I ran Lumpy on some non-human samples and found some extremely large deletions and duplications at almost the same positions.

The examples are:

var_id | BIN | SVLEN DEL_1 | chr1:13725489-74725721 | 61000232 DUP_1 | chr1:13725496-74725224 | 60999728 DEL_2 | chr1:51839810-101987685 | 50147875 DUP_2 | chr1:51839640-101987485 | 50147845

I ran Lumpy using smoove and duphold option. The depth change (DHFFC) does not support these large variants. Therefore, these calls might be false calls. however, do you know any known artefacts that cause this kind of behaviours?

I don't know whether it will help... but I found 95 calls (deletions and duplications, >100kb) on chr 1, and 71 of them are pairs of deletions and duplications like explained above.

Thanks a lot for comments in advance! :)

Lim

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

lee039 commented 4 years ago

Hi,

Thanks for the suggestion. Sorry I will confess, that I did not use that programme you recommended. But I use smoove and duphold together, so I already knew that the extreme long CNVs do not have depth changes, thus are false positives.

What I realized was that at least many of these deletion-duplication pairs were cause by repeats. I checked the flanking sites for deletion-duplication pairs (n=20) in igv, and found that 16 were due to repetitive elements. For the rest 4, I could not find evident repeats causing this. I don't know whether this was evident for you.. but I did not know that repeats can cause these many false positives. It would be nice if the SV callers are repeat-aware and give low QUAL or PASS flag in the vcf file ! :) would that be possible?

Lim

ryanlayer commented 4 years ago

Yep. Those big SVs are usually repeat artifacts that can be difficult to remove. Lumpy takes an exclude region file that you can add to I’d you want. We typically subtract SVs with repeat tracks and the end but the results are the same

On Feb 14, 2020, at 1:07 AM, Limitto notifications@github.com wrote:

 Hi,

Thanks for the suggestion. Sorry I will confess, that I did not use that programme you recommended. But I use smoove and duphold together, so I already knew that the extreme long CNVs do not have depth changes, thus are false positives.

What I realized was that at least many of these deletion-duplication pairs were cause by repeats. I checked the flanking sites for deletion-duplication pairs (n=20) in igv, and found that 16 were due to repetitive elements. For the rest 4, I could not find evident repeats causing this. I don't know whether this was evident for you.. but I did not know that repeats can cause these many false positives. I would be nice if the SV callers are repeat aware and give low QUAL or PASS flag in the vcf file ! :) would that be possible?

Lim

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

lee039 commented 4 years ago

Can you elaborate on how do you "subtract SVs with repeat tracks"?

I was thinking of making a bed format file $1(chr) $2(cnv_st_pos-CI95) $3(cnv_st_pos+CI95) and then to do bedtools intersect -a cnv_start_position_CI -b RepeatMaster. Also, do the same with the cnv_ending_position. Then if there is the same repeat(s) identified for cnv_start_position and cnv_ending_position, I would filter it. Does this sound logical to you? also, more or less in line with how you filter the repeat-induced false positives? or do you have an even better approach? Let me know if you have some better ideas! :)

ryanlayer commented 4 years ago

Yep. looks good.

On Fri, Feb 14, 2020 at 9:25 AM Limitto notifications@github.com wrote:

Can you elaborate on how do you "subtract SVs with repeat tracks"?

I was thinking of making a bed format file $1(chr) $2(cnv_st_pos-CI95) $3(cnv_st_pos+CI95) and then to do bedtools intersect -a cnv_start_position_CI -b RepeatMaster. Also, do the same with the cnv_ending_position. Then if there is the same repeat(s) identified for cnv_start_position and cnv_ending_position, I would filter it. Does this sound logical to you? also, more or less in line with how you filter the repeat-induced false positives? or do you have an even better approach? Let me know if you have some better ideas! :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/322?email_source=notifications&email_token=AAEUGUISKSLKAB7T4OHBTSTRC3AYNA5CNFSM4JVFQHRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELZS2JI#issuecomment-586362149, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEUGUITZDXJBEBSNJDWILTRC3AYNANCNFSM4JVFQHRA .