Cutoff for duplications

brentp / duphold

don't get DUP'ed or DEL'ed by your putative SVs.

MIT License

101 stars 9 forks source link

Cutoff for duplications #53

Open robertzeibich opened 1 year ago

robertzeibich commented 1 year ago

The cutoff for deletions is DHFFC 0.7. What is the recommended DHBFC cutoff for duplications?

brentp commented 1 year ago

Hi, duplications are harder, but 1.3 is a reasonable start.

Qijie0615 commented 10 months ago

The detection of duplications is harder. I’m unsure if I can use DHFFC>1.3 or DHBFC>1.3. After the population genotyping and Duphold, I found that some duplications(0/1,1/1) have DHBFC<1.3, but DHFFC>1.3 in the 30x WGS data, and the Samplot results confirm it to be true. Could you give me some advice?

brentp commented 10 months ago

As you find, it's hard to come up with a good cutoff for duplications. The 1.2 cutoff might work in many cases, but would miss when there is already a large cassette that adds a single copy in a tandem dup. You'll have to experiment with what works.

Qijie0615 commented 10 months ago

Thanks for the quick reply.

I want to know that 1.2 means DHBFC >1.2.
I'm sorry I can't understand this sentence. “it would miss when there is already a large cassette that adds a single copy in a tandem dup” . "cassette" is ？
I would like to use DHBFC>1.2 to further filter the population genotyping data and reduce the false positive rate. Do you think this is a good idea?

brentp commented 10 months ago

Thanks for the quick reply.

I want to know that 1.2 means DHBFC >1.2.

Yes, you could try this.

I'm sorry I can't understand this sentence. “it would miss when there is already a large cassette that adds a single copy in a tandem dup” . "cassette" is ？

I mean if you have a tandem duplication with 10 copies and then you add another single copy, you only expect a 10% increase in depth.

I would like to use DHBFC>1.2 to further filter the population genotyping data and reduce the false positive rate. Do you think this is a good idea?

It's worth trying, but you'll have to evaluate for yourself how effective it is. If you have trios, you can look at mendelian violations and transmissions. Otherwise, you can look at samplots of variants that are filtered

Qijie0615 commented 10 months ago

Thank you for your quick reply.