eldariont / svim-asm

Structural Variant Identification Method using Genome Assemblies
GNU General Public License v3.0
92 stars 11 forks source link

can SVIM-asm used for polyploid assembly #2

Closed zhen0506 closed 3 years ago

zhen0506 commented 3 years ago

Hi there, I wonder if this tool can be used for polyploid assemblies. Thank you.

eldariont commented 3 years ago

Hi Zhen,

thanks for your question. Currently, SVIM-asm is designed for assemblies with 1 (haploid) or 2 (diploid) assembled haplotypes only. In the diploid case, SVIM-asm is able to infer the genotype of each variant from the two given contig sets.

If you want to analyze a polyploid assembly consisting of, say, 4 assembled haplotypes (4 sets of contigs) then SVIM-asm does not offer a simple solution, I'm afraid. One option in this case would be to call SVs separately for each haplotype using the haploid mode of SVIM-asm. This will give you 4 VCF files containing the variants present in each haplotype. In a next step, you could merge those VCF files into one VCF containing the complete genotype across all four haplotypes. However, I'm not sure whether a tool for the second step exists already or whether you would need to come up with your own script.

Cheers David

zhen0506 commented 3 years ago

Hi David, Thank you for your reply. Another question is that whether masked genome or unmasked genome will effect the results? Thanks, Zhen

eldariont commented 3 years ago

Hi Zhen,

SVIM-asm does not directly operate on a genome but on a genome-genome alignment. Therefore, masking does not have any direct effect on SVIM-asm. However, the masking might affect the genome-genome alignment given as input to SVIM-asm and as a consequence the results obtained by SVIM-asm.

Best, David

ddrichel commented 3 years ago

Hi David,

I am doing this with an own script:

If you want to analyze a polyploid assembly consisting of, say, 4 assembled haplotypes (4 sets of contigs) then SVIM-asm does not offer a simple solution, I'm afraid. One option in this case would be to call SVs separately for each haplotype using the haploid mode of SVIM-asm. This will give you 4 VCF files containing the variants present in each haplotype. In a next step, you could merge those VCF files into one VCF containing the complete genotype across all four haplotypes. However, I'm not sure whether a tool for the second step exists already or whether you would need to come up with your own script.

Is there a good reason why genotypes in calls from a haploid assembly are "1/1"? Shouldn't they be just "1"?

Kind regards

Dmitriy

eldariont commented 3 years ago

Hi Dmitriy,

no, there is no good reason I'm afraid. You are right that, for haploid assemblies, it should be just 1. Thanks for making me aware of it. I need to change that.

Best David