Closed jiadong324 closed 4 years ago
Hi @jiadong324,
Best,
Davide
Hi @davidebolo1993,
Thanks for the reply. I've closed the previous issue.
As you mentioned it is 5X on each haplotype. But for the adjacent tandem duplication shown in IGV, there are more normal aligned read-pairs for the first tandem duplication.
Flanking inversion is just one case. I want to make nested events by randomly combining the those events supported by VISOR, so that I can produce different types of complex SVs.
Thanks!
Hi @jiadong324,
I'm not sure I completely got your question. You said that you see more "normal" read pairs for the first tandem duplication but from IGV I can just see an increase in coverage for the 2 simulated tandem duplications, as expected. I see that for the first tandem duplication you have few more reads than for the second (the inverted one), but this makes sense as reads are drawn at random for all the region/chromosome specified in BED for SHORtS and this is something that happens in true-to-life duplications.
Let me know if I missed something.
Best,
Davide
Hi @davidebolo1993
The simulation is correct if you look at the coverage. My concern is:
For example, if we sequence 10 reads for this region from two haplotypes, ideally 5 reads may sequenced from the SV haplotype. If this is true, then it is expected to have more abnormal read pairs in green (reverse-forward mapping) than observed in the IGV. It looks fine for the second inverted tandem duplication.
Thanks!
Hi @jiadong324,
sorry but there is something I'm still missing. If the second tandem duplication (the inverted one) looks fine, than the first one looks fine as well. Indeed, for a non-inverted tandem duplication, there shouldn't be any abnormal read pair, if I understood correctly what you mean.
Best,
Davide
Hi,
I've done the following steps to simulate: 1) Adding known variants to each chromosome by haplotypes to create .h1.fa and .h2.fa. 2) Adding complex SVs to haplotype one (h1). 3) Using SHoRt to run simulation of coverage 10. Purity and contamination are set to be 100.
Here is one site after simulation, red lines are outer breakpoints of nested events of two adjacent duplication. I write a script to randomly make combinations of these basic events to make nested events as you suggested before.
My question are 1) what is the fraction of simulating short reads from each haplotype? 2) For tandem duplications, the duplicated sequence is directly append after the current sequence. So, from my understanding, two adjacent tandem duplication may not be suitable for the simulation.
Thanks!