KamilSJaron / smudgeplot

Inference of ploidy and heterozygosity structure using whole genome sequencing data
Apache License 2.0
237 stars 24 forks source link

Wrong result of Salix #52

Closed BeardLi closed 4 years ago

BeardLi commented 4 years ago

Hi,

I ran into some problems when I do smudgeplot with Salix data. Here is my reslut: ljs_smudgeplot ljs_smudgeplot_log10

My L and U is determined after I ran genomescope: plot

I'm confused because this picutre says that my reads' coverage is only ~7, which actually should be ~18

Afterall, accroding to this picture, my L is 7. And I have a terrible result. The thing I want to know is: Is there anything wrong with my reads data? Or something wrong with other part?

Thanks for your program.

KamilSJaron commented 4 years ago

Hi there,

Regarding 7.3x or 18x. 7.3 is an estimate of monoploid coverage (how many times every haplotype was covered), while I expect 18x was your estimate of 2n coverage (how many times every site in the genome was covered regardless from which haplotype the reads comes). So, it seems that the kmers still predict a smaller coverage (~15x), but it's nothing as dramatic as 7 vs 18.

And regarding the noisy smudgeplot. The thing is you have very very low coverage, from the documentation:

To sum up, if you have >100x per haplotype, the smudgeplot should be really nice. If you have nice libraries and >25x per haplotype, the smudgeplot should be really nice. If you have less than that or there is something else going on (like whole genome amplification), smudgeplot probably won't be very informative.

The smudgeplot looks often quite weird when the coverage is so low that haploid and error kmers are impossible to separate.

As a conclusion, there is nothing wrong with your data besides being really low coverage. My best guess is that it's a diploid, looking just at the GenomeScope (I don't think you really need a smudgeplot here, it's a clear cut). Hope this helps :-)

BeardLi commented 4 years ago

Hi, Thank you very much for your help! Indeed, all my datas' coverage are between 16x-26x. Obviously not enough for the smudgeplot. Thank you again for your cool program and patience!!!

KamilSJaron commented 4 years ago

With pleasure.

Hope you'll find at least the genome scope informative (that seems to me that that worked quite nice). CLosing the issue for now, but feel free to reopen if something...