TGAC / KAT

The K-mer Analysis Toolkit (KAT) contains a number of tools that analyse and compare K-mer spectra.
http://www.earlham.ac.uk/kat-tools
GNU General Public License v3.0
200 stars 51 forks source link

Interpreting Heterozygosity Rate #141

Closed AllisonStander closed 4 years ago

AllisonStander commented 4 years ago

Good day,

I would just like to make sure whether the value given for heterozygosity rate is in %.

In the .out file it gives the value as 0.03%, but in the .json file it gives it as 0.0289883741650187. Is this 0.03% or 2.89% heterozygosity?

Using other programs (BBnorm, GenomeScope, and FindGSE) I get values between 1% and 3%.

Kind regards, Allison

jonwright99 commented 4 years ago

Hi Allison, If you are using the spectra.py script to do this I wouldn't trust the result too much. It was written a long time ago and only works if you have a super-clean spectra as it tries to fit the distributions. The HET or HOM rate is basically the amount of sequence in the HET or HOM peaks divided by the total amount of sequence. I would use the result from GenomeScope as it uses a much more accurate method. Best wishes, Jon

AllisonStander commented 4 years ago

Hi Jon,

Thank you for the reply. I am looking at the output from KAT hist.

The two values given for heterozygosity rate are the same (0.03 and 0.028...) but I am unsure whether this result is a percentage, or a fraction of 1? In the output file it says it is a percentage, but in the .json file it just gives the value without a "%".

For other programs I have assumed that a value is given only as a percentage when specified. But I am unsure what the KAT value is being given as.

Kind regards, Allison

jonwright99 commented 4 years ago

Hi Allison,

According to the code the heterozygosity rate is a percentage in both cases and the value is rounded to 2dp when outputted to the screen and unrounded in the JSON file. Although this is an output from kat hist, it is running one of the old scripts that comes with KAT so I would recommend that this value be intepreted with caution. It's likely that Genomescope would give a more accurate heterozygosity rate.

Kind regards, Jon

AllisonStander commented 4 years ago

Thank you Jon :)