TGAC / KAT

The K-mer Analysis Toolkit (KAT) contains a number of tools that analyse and compare K-mer spectra.
http://www.earlham.ac.uk/kat-tools
GNU General Public License v3.0
200 stars 51 forks source link

NeedHelpIntepretingResults #184

Open jiangqiuqiuu opened 9 months ago

jiangqiuqiuu commented 9 months ago

Hi,

I am working on assessing the genome I assembled with Hifiasm. The genome size estimate with flow-cytometry of the target plants I am working on is around 120Mb. The assembled genome size with Hifiasm is 272Mb.

I suspect there could be several things going on with my data: 1) This genome is super heterozygous, the assembled genome is from two very divergent haplotypes. 2) The genome sequence we obtained were from multiple individulas. 3) There might be cross-contamination from other samples during sequencing procedures. I am trying to figure out what is exactly going on and run some of those k-mer analysis to see whether it could provide any insights.

Here I attach 3 plots and the log files. 1) The plots for bot k-mer distribution of this sample 2) The GC KAT contig length and duplication plot 3) Comparison between the reads and the assembly

sorg_k27.pdf sorg_k27_cold.pdf sorg_comp_fq_assem-main.mx.density.pdf

nohup1.txt nohup2.txt nohup3.txt

Thank you guys for making this amazing tool available and we have benifited from it a lot! I would really appreciate it if you could provide any insights on how to inteprete those plots. Thank you a lot.

Cheers Qiuyu