Closed jiadong324 closed 4 years ago
Hi @jiadong324,
Indeed, you got it right. When simulating from a unique clone (a single HACk folder with one ore more haplotype), the purity column indicates the percentage of reads that are simulated from the reference (for the same region specified in BED for SHORtS/LASeR) with respect of the total coverage (that is, if you simulate 50X at AF 50%, you get 25X from the 2 haplotypes and 25X from reference).
Best,
Davide
Hi @davidebolo1993,
According to your supplementary note, I think the purity indicates percentage of reads simulated from the hacked genome, and the rest are from reference genome.
Let me make the example of AF more clear. If I only hack simulated SVs to one of the two haplotypes, let's say h1. By introducing purity, actually you bring reference as the third virtual hap (g) regarding to the hacked h1 and non-hacked h2. When we start to sequence reads, indeed you are going to get reads from g, h1 and h2.
If you think description in 2 is correct please go to 3.
Sorry for so much questions, I am trying to understand the details and use VISOR in a proper way.
Thanks!
Hi @jiadong324,
no worries. I'm happy VISOR stimulates such an interest. The description you gave is perfectly fine and is, indeed, what VISOR does.
Best,
Davide
Yes, VISOR is really helpful to what I am doing now.
So, if I set purity to 100%, and hack simulated SVs on h1. As a result, I will get SVs of AF 50%.
Thanks!
Hi,
I read through the supplementary about purity and also according to what you described in previous issue, I am not sure if I use the parameter correctly.
For example, I am using 50X for two haplotypes h1 and h2, of which h1 hacked by SVs and h2 only by SNPs. Thus, these simulated SVs on h1 is HET (according to your previous explanation), namely, the allele fraction of these SVs is 50% if I understand correctly. Then, if I set purity to 100, the SHoRt will sequence equally from each hap. While I set purity to 80 and keep others unchanged as described above, thus, 80% of the reads come from the hacked two haps and the rest 20% are created from reference genome. Then, the allele fraction of these simulated SVs should be 40%.
In my previous non-hap simulation, if I want to simulated SV of allele fraction 50% at coverage 50X, I would first use wgsim sequence 25X reads from hacked genome and another 25X from reference genome.
Please let me know if I understand correctly of using this parameter, thanks a lot!
Best, Jiadong