Closed Giacomoggioli closed 2 years ago
Hi Giacomo. One way of identifying the distributions is by the approximate genome size they appear to sample, just count the number of kmers under each distribution. For this, you need to know roughly your genome sizes. Another option is to make a draft assembly and then use kat sect to project the kmer count on top of the sequences (contigs should be enough). Then blast a few sequences to identify which contig comes from which distribution. Hope it helps! Best Gonza.-
Hello,
I am working with two species of animals involved in symbiosis with Bacteria. Therefore, the Illumina reads we have obtained are coming from both the host and the symbiont. I have used kat gcp to obtain the attached plots. Now I would like to extract just the host's k-mers in order to be able to estimate the proper genome size. I have seen that I should be able to do this using "kat filter kmer" and "kat filter seq" but I am not sure about which k-mers are coming from the host and which are coming from the Bacteria. Is there a way to tell this by looking at the gcp plots? Finally, which --threshold would you suggest me to use with "kat filter seq"?
Best regards,
Giacomo
kat-sp2.pdf kat-sp1.pdf