Gardner-BinfLab / deltaBS

Quantifying the significance of genetic variation using probabilistic profile-based methods.
MIT License
17 stars 1 forks source link

No reporting of zero DBSs (for all matching gene families)?! #4

Closed molleraj closed 3 years ago

molleraj commented 3 years ago

Hi! I came across an error that I think may be a serious flaw in my analysis. For two core genes in a set of 380 genomes I am testing, I calculate DBS with deltaBS.pl for the genome copy against a reference (representative) copy of the core gene determined by PIRATE. Unfortunately, I only get a results.dbs output for 199 out of 380 genomes. In the other cases, I get this error:

Calculating empirical cutoff: 0 Use of uninitialized value $sum in division (/) at /home/agmolle/deltaBS/src/deltaBS.pl line 507. Illegal division by zero at /home/agmolle/deltaBS/src/deltaBS.pl line 559.

I thus decided to add this line for debugging: 227: print $score."\n";

This is the new output: Determining and filtering domain architecture, calculating delta BS... 0 0 0 0 0 done. Calculating empirical cutoff: 0 Use of uninitialized value $sum in division (/) at /home/agmolle/deltaBS/src/deltaBS2.pl line 509. Illegal division by zero at /home/agmolle/deltaBS/src/deltaBS2.pl line 561.

Does this mean the delta bit score was found to be zero six times? If so, are zero delta bit scores not reported when all cases (all matching protein families) are 0?

I do not get this error when one of the printed scores is different from zero. Thanks, Jon

lbarquist commented 3 years ago

Hi,

So, I think the problem is that all of your DBS values for the comparison are 0; the calculation of the mean assumes at least some are non-zero. I'm assuming this is happening because the sequences you're comparing are identical or nearly identical?

I've put in a quick fix for this, can you test that this works for you now and let me know?

Thanks, -Lars

molleraj commented 3 years ago

Hi Lars,

This is exactly the issue. I modified the calc_mean subroutine to count zero values when calculating the mean delta bitscore. Indeed the sequences I am comparing are identical or nearly identical.

I will also try your fix.

Thanks! Jon