barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
137 stars 21 forks source link

NA frequency for a JC-based variant. What does it mean? #337

Closed vr1087 closed 1 year ago

vr1087 commented 1 year ago

While casting mutation frequency strings to a floats, our variant database transformer broke when it encountered an 'NA' string. This was for a large_subsitution mutation based on JC evidence only. I see in the source code that JC can have an NA annotation for the frequency. I have two questions:

https://github.com/barricklab/breseq/blob/b3d4eda7d7caa10ebd56fbe65eabdc928898d922/src/c/breseq/output.cpp#L342-L346

using: breseq version 0.37.0 revision 25ce8c36ad2f

jeffreybarrick commented 1 year ago

It's intentional: the NA frequency occurs when both sides of the new junction map to repeat/redundant regions in the reference genome. Because we can't trust the read counts for those to be accurate this makes the denominator in the frequency calculation zero. Is this the case for your junction?

vr1087 commented 1 year ago

I'll have to generate the full html output to find out. Thanks, for the quick response!