Closed VictorHerp closed 2 years ago
Hi VIctor, sorry for the delay, this was a long post and I wanted to find more than few minutes so I could actually give it a more proper thought.
Anyways, this is an interesting genome and exactly the type of a genome we developed smudgeplot for (it was mostly polyploid root-knot nematodes we tried to understand). I agree the smudgeplot has some labeling issues. And the genomescope model also looks a bit funky, but I got to say, this is one of those datasets where I would prefer to try to fit a few models before I will attempt to guess what is what. Would you be willing to share me the kmer pairs file and the k-mer histogram?
Finally, you probably know the expected 1C value / genome size, right? That can be really useful when trying to fit genome models.
Hi Kamil, Thank you so much for the help, I sent you a message to your email to be able to send you the files that you asked for. Greetings.
Cool (I did see the email - that is one big k-mer pair file :-D )
I consider this issue resolved, but if anythimg, don't hesitate to reopen the issue please.
Hi Kamil, I would like to ask for your advice with a fairly problematic genome. I used Genomescope to estimate the genome length of the Adriatic sturgeon (Acipenser naccarii, 2n=239 +/- 7 chromosomes, I attached the karyotype) considered to be functionally tetraploid, but the results are confusing. Although I have “good” nucleotide coverage (89x, that is, ~22x per haploid) there is a first peak that seems informative but Genomescope, Tetmer and Smudgeplot indicate that it is probably a peak of errors (although it has a kcov of ~ 12, I attached the genomescope plot). One thing to consider is the evolutionary history of this species which could create a bias in the analysis. It is hypothesized that this species had reached a level of octaploidy (with at least two whole-genome duplications) before undergoing a process of diploidization, therefore it is currently considered to be tetraploid even if some loci could still be octaploid or even diploid. One interesting thing that Hannes Becher told me is that the k-mer spectrum might look like an octaploid (if the multiplicity 12 peak is a data peak and isn’t due to contamination). I was wondering if maybe what Genomescope does is detect the maximum level of polyploidy in the genome even though in theory most of the loci in this genome are tetraploid. So, to corroborate the haplotype structure of the genome I used smudgeplot, which left me with more doubts than answers. According to smudgeplot the genome is pentaploid (something maybe not very likely), but it gives me this warning:
This is the code that I used (L and U based on the genomescope plot), I also tried with “smudgeplot.py cutoff, result: L10, U980”:
I know that the coverage I have is low compared to the requirements for smudgeplot, so I would like to ask you if in your opinion the fact that there are no octaploid haplotype structures and that there are pentaploid structures instead of tetraploids is a problem related to the low kcov for haploid or is there something else that I am missing?
Waiting for your answer, thank you! Víctor Muñoz