Closed jhcaddisfly closed 2 years ago
Hello J.,
not sure what's the question here. I presume that you have a diploid predicted to be a tetraploid.
What what I can tell, there is only very little heterozygosity in the species (looking at the genomescope estimates). Perhaps it's the same problem as in the diploid strawberry case which also looked tetraploid because the paralogy signal dominated the smudgeplot due to very low heterozygosity: https://github.com/KamilSJaron/smudgeplot/wiki/tutorial-strawberry
Hello,
Thank you very much for your quick response!
Yes, my question was why smudgeplot predicts my studied organism to be tetraploid since I would expect it to be diploid because of the haploid genome size estimate from FCM and Genomscope2.
To conclude, can I argue that because of the low heterozygosity (0.35-0.39%) the duplication signal is relatively stronger than the heterozygosity signal)? So, are the duplications rather recent? Can I say that my organisms have a lot of closely related paralogs
(paralogs: AABB 81% / 69% vs. heterozygous loci: AB 19% / 17%) and that this is cause because smugeplot picks two homozygous loci that are exactly one nucleotide different up as AABB?
So, do you think the evidence for tetraploidy according to smudgeplot is rather low?
Sorry for asking so many questions. I just want to make sure I understand the results correctly. I really like smudgeplot! It is super helpful and I am happy I came across this tool!
Thanks, J.
To conclude, can I argue that because of the low heterozygosity (0.35-0.39%) the duplication signal is relatively stronger than the heterozygosity signal)?
Yeah, that's a documented problem of very homozygous genomes.
So, are the duplications rather recent? Can I say that my organisms have a lot of closely related paralogs
We don't really infer the evolutionary history, so it's hard to say if these are paralogs or other types of *logs, nor we date them. Although it probably is due to recent paralogs, all we can tell is there are plenty of close-but-not-identical duplications in the genome. So yes, but you might want to be careful about phrasing.
that this is cause because smugeplot picks two homozygous loci that are exactly one nucleotide different up as AABB?
Smudgeplot picks all the kmers distant by 1 nt and by projecting them on a A + B and A / (A + B) plane it determines their copy numbers / relative counts. The interpretation of AABB depends on your genome, what it means that there are plenty of closely related kmers that are both in two copies in the genome. If the genome is diploid, they are most likely two homozygous loci distant by 1 nt, but in theory also could be 2 heterozygous loci that are perfect duplicates with the same genotype (I guess extremely unlikely, but what do I know).
I think you mostly got this right now, no worries :-)
K
I have troubles understanding my smudgeplots for two individuals of the same species. I have used v0.2.3 and the following command to generate the plots:
and it look like this:
Now, I have indication already of genome size from genomescope and flow cytometry (651.30 Mbp but a different individual of the same species). The Genomescope results look like this:
http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=b7yzvd5lbmeuMEEx5swf
http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=e68aubTXZEgN9Ix2u9td
This does not make sense together with the smudge because it predicts unexpected ploidy.
How should I understand my smudgeplot?
Thanks, J.