fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering
MIT License
347 stars 46 forks source link

SV simulation #148

Open asylvz opened 3 years ago

asylvz commented 3 years ago

Hello,

Just to clarify things, I have a couple of questions:

Thanks so much, Arda

fritzsedlazeck commented 3 years ago

Hey Arda, yes sorry this is a new feature so not very well documented.

  1. Yes set to 2 and it simulates diploid. Higher ploidy is not supported.
  2. So this option is about if you use the simulated sequence (reference) as the reference (1: real reads option) or to simulate reads from this reference (0 option). This is needed because the coordinates of the variants will be shifted depending on what option you are choosing.
  3. Correct.
  4. Sadly not. You can only set one size range within each SV type.

Hope that clears it up Fritz

asylvz commented 3 years ago

Thanks so much Fritz,

Sorry but I'm not clear about the question 2; I want to test my algorithm with a simulated data. So, I want SURVIVOR to add the variants to human reference, then will use ART to generate reads from that. Then I should use option 0 in order to get a fasta file of human reference with the variants added right?

fritzsedlazeck commented 3 years ago

Exactly .

If you want to use real reads and test your algorithm over that it would be option 1, with mapping the reads then to the so generated genome.

asylvz commented 3 years ago

I have one last question, I'm using grch37, which is haploid, so what happens when I set "NUMBER_haploid" to 2. Since the genome is not diploid, it has no effect as far as I see from the output, right?

fritzsedlazeck commented 3 years ago

The simulator (simSV) is simulating the variants on one of the other copy of the grch37 chromosomes. So within the simulator it handles a diploid model .

You should see the differences in the output (more chrs ) and VCF file (het vs. homozygous genotype).

asylvz commented 3 years ago

Oh sorry, yes you are right. I was expecting to see "_maternal", "_paternal" added to the header of the chromosomes in the fasta file so I hadn't gone to the end of the file since I did not see it at the initial chromosomes :)

Thanks, Arda