Contamination in GCP plots

TGAC / KAT

The K-mer Analysis Toolkit (KAT) contains a number of tools that analyse and compare K-mer spectra.

GNU General Public License v3.0

200 stars 51 forks source link

Hello,

I am working with two species of animals involved in symbiosis with Bacteria. Therefore, the Illumina reads we have obtained are coming from both the host and the symbiont. I have used kat gcp to obtain the attached plots. Now I would like to extract just the host's k-mers in order to be able to estimate the proper genome size. I have seen that I should be able to do this using "kat filter kmer" and "kat filter seq" but I am not sure about which k-mers are coming from the host and which are coming from the Bacteria. Is there a way to tell this by looking at the gcp plots? Finally, which --threshold would you suggest me to use with "kat filter seq"?

Best regards,

Giacomo

kat-sp2.pdf kat-sp1.pdf

TGAC / KAT

Contamination in GCP plots #150