KAT is a suite of tools that analyse jellyfish hashes or sequence files (fasta or fastq) using kmer counts. The following tools are currently available in KAT:
In addition, KAT contains a python script for analysing the mathematical distributions present in the K-mer spectra in order to determine how much content is present in each peak.
This README only contains some brief details of how to install and use KAT. For more extensive documentation please visit: https://kat.readthedocs.org/en/latest/
From brew
If you have brew installed on your system you should be able to install a recent version of KAT by simply typing:
brew install brewsci/bio/kat
Many thanks to @sjackman for this one!
From bioconda
If you use bioconda you can install KAT using :
bioconda install kat
From Source
If you wish to install KAT from source, because you don't have brew installed, or wish to ensure you have the latest version, first ensure these dependencies are installed and configured on your system:
NOTE ON INSTALLING PYTHON: Many system python installations do not come with the C API immediately available, which prevents KAT from embedding python code. We typically would recommend installing anaconda3 as this would include the latest version of python, all required python packages as well as the C API. If you are running a debian system and the C libraries are not available by default and you wish to use the system python installation the you can install them using: sudo apt-get install python-dev
. Also if you are using a python installation outside your system directory, please make sure you have your PATH and LD_LIBRARY_PATH (or LD_RUN_PATH) environment variables set appropriately.
Then proceed with the following steps:
git clone git@github.com:TGAC/KAT.git
; or for https: git clone https://github.com/TGAC/KAT.git
), into a directory on your machine.cd KAT
./build_boost.sh
./autogen.sh
../configure
. The configure script can take several options as arguments. One commonly modified option is --prefix
, which will install KAT to a custom directory. By default this is /usr/local
, so the KAT executable would be found at /usr/local/bin
by default. Python functionality can be disabled using --disable-pykat
. Type ./configure --help
for full list of options. Please check the output to ensure the configuration is setup as you'd expect.make
. You can leverage extra cores duing the compilation process using the -j <#cores>
option. Also you can see all command lines used to build the software by setting V=1
.make check
. (The -j
and V=1
options described above are also supported here.)make install
. If you've not provided a specific installation directory, you will likely need to prefix this command with sudo
in order to provide the permissions required to install to /usr/local
.If sphinx is installed and detected on your system then html documentation and man
pages are automatically built during the build process. If it is not detected then this step is skipped. Should you wish to create a PDF version of the manual you can do so by entering the doc
directory and typing make pdf
, this is not executed by default.
NOTE: if KAT is failing at the ./autogen.sh
step you will likely need to install autotools. The following command should do this on MacOS: brew install autoconf automake libtool
. On a debian system this can be done with: sudo apt-get install autoconf automake libtool
.
Python scripts
KAT will install some python scripts to your <prefix>/bin
directory. If you selected a custom location for prefix and wish to access these scripts directly, then it may be necessary to modify your $PYTHONPATH environment variable. Ensure that <prefix>/lib/python<version>/site-packages
, is on your PYTHONPATH, where /home/me/kat/lib/python3.6/site-packages
. Alternatively, you could install the kat python package into a python environment by changing into the scripts
directory and typing python setup.py install
.
After KAT has been installed, the kat
executable file should be available which contains a number of subtools.
Running kat --help
will bring up a list of available tools within kat. To get help on any of these subtools simple type: kat <tool> --help
. For example: kat sect --help
will show details on how to use the sequence coverage estimator tool.
KAT supports file globbing for input, this is particularly useful when trying to count and analyse kmers for paired end files. For example,
assuming you had two files: LIB_R1.fastq, LIB_R2.fastq in the current directory then kat hist -C -m27 LIB_R?.fastq
, will consume any
files matching the pattern LIB_R?.fastq as input, i.e. LIB_R1.fastq, LIB_R2.fastq. The same result could be achieved listing the files at
the command line: kat hist -C -m27 LIB_R1.fastq LIB_R2.fastq
Note, the KAT comp subtool takes 2 or three groups of inputs as positional arguments therefore we need to distinguish between the file groups.
This is achieved by surrounding any glob patterns or file lists in single quotes. For example, assuming we have LIB1_R1.fastq, LIB1_R2.fastq,
LIB2_R1.fastq, LIB2_R2.fastq in the current directory, and we want to compare LIB1 against LIB2, instead of catting the files together, we might
run either: kat comp -C -D 'LIB1_R?.fastq' 'LIB2_R?.fastq'
; or kat comp -C -D 'LIB1_R1.fastq LIB1_R2.fastq' 'LIB2_R1.fastq LIB2_R2.fastq'
.
Both commands do the same thing.
GNU GPL V3. See COPYING file for more details.
If you use KAT in your work and wish to cite us please use the following citation:
Daniel Mapleson, Gonzalo Garcia Accinelli, George Kettleborough, Jonathan Wright, and Bernardo J. Clavijo. KAT: A K-mer Analysis Toolkit to quality control NGS datasets and genome assemblies. Bioinformatics, 2016. doi: 10.1093/bioinformatics/btw663
See AUTHORS file for more details.
We would also like to thank the authors of Jellyfish: https://github.com/gmarcais/Jellyfish; and SeqAn: http://www.seqan.de/. Both are embedded inside KAT.