Kingsford-Group / bloomtree

GNU General Public License v3.0
73 stars 14 forks source link

Having trouble getting sbt to work on simple test data #4

Closed ctb closed 9 years ago

ctb commented 9 years ago

Hi, please see data and scripts in --

https://github.com/ctb/2015-sbt-demo

(note the README)

I'm having trouble getting positive query results; my output is here:

https://github.com/ctb/2015-sbt-demo/blob/master/query.out

and the build script is here:

https://github.com/ctb/2015-sbt-demo/blob/master/run.sh

When I run a simple khmer script I get what I think are the right results:

% python check-with-khmer.py genome-a.fa reads-*.fa 
genomeA 11 1000 0.989
% python check-with-khmer.py genome-c.fa reads-*.fa
genomeC 980 1000 0.02

Any help? thanks!

Bradsol commented 9 years ago

A very interesting problem so thank you for that! The issue here is in your build script, specifically when you are building the original filters (your 'count' commands). 1e7 apparently processes as 1 in the existing code base whereas 10000000 correctly makes filters with the proper size.

What was happening is the bloom tree was being built on filters with a size of one which the Jellyfish hash functions we use are not capable of properly mapping to. (There are some modulus calculations that were always returning zero in the code elements that check for the inclusion of a particular k-mer).

I will certainly work on correcting the input to properly handle scientific inputs but until then try to rerun your existing code using plain integers.

Let me know if that works for you!

ctb commented 9 years ago

On Sun, May 31, 2015 at 11:52:55AM -0700, Bradsol wrote:

A very interesting problem so thank you for that! The issue here is in your build script, specifically when you are building the original filters (your 'count' commands). 1e7 apparently processes as 1 in the existing code base whereas 10000000 correctly makes filters with the proper size.

What was happening is the bloom tree was being built on filters with a size of one which the Jellyfish hash functions we use are not capable of properly mapping to. (There are some modulus calculations that were always returning zero in the code elements that check for the inclusion of a particular k-mer).

I will certainly work on correcting the input to properly handle scientific inputs but until then try to rerun your existing code using plain integers.

Let me know if that works for you!

Thanks, this fix worked!