Kingsford-Group / splitsbt

GNU General Public License v3.0
18 stars 3 forks source link

fatal: Split Bloom Filter bf() should be SBF. #3

Closed membranepotential closed 6 years ago

membranepotential commented 6 years ago

Hello,

Unfortunately, bt build does not work currently. Apparently, dynamic casting of a BF* to an SBF* fails. I tried to find error sources, but was unable to narrow it down.

Maybe it is connected with getting rid of the overloaded functions SBF::load()?

Cheers!

Bradsol commented 6 years ago

Is this an error in the release version or the most recent commit (or some specific commit prior to the latest)?

Bradsol commented 6 years ago

Additionally can you provide the command you used to build a bloom filter?

membranepotential commented 6 years ago

The error appeared with the latest commit and

# Create hash function
$ ~/packages/splitsbt/bt hashes ssbt_hashes 1

# Make bloomfilters with
$ BF_SIZE=2000000000
$ ~/packages/splitsbt/bt count --threads 4 ssbt_hashes.bin $BF_SIZE <(gunzip -c raw/$SRA/*) "bloomfilters/${SRA}.bf.bv"

# Make file listing the bloomfilters
$ head -n1 bloomfilter_list.txt
/home/sra_index/bloomfilters/DRR008828.bf.bv

# Build the SSBT
$ ~/packages/splitsbt/bt build ssbt_hashes.bin bloomfilter_list.txt ssbt.bloomtree
Starting Bloom Tree
Kmer size = 20
Building...
With build type 2
Building tree on 1763 bloom filters.
Loading hashes from ssbt_hashes.bin
# Hash applications=1
Read hashes for k=20
Inserting leaves into tree...
Inserting leaf /home/sra_index/bloomfilters/DRR008828.bf.bv ...
Inserting leaf /home/sra_index/bloomfilters/DRR015133.bf.bv ...
At node: /home/sra_index/bloomfilters/DRR008828.bf.bv
Load_size: 2000000000
Load_size: 2000000000
fatal: Split Bloom Filter bf() should be SBF.

I couln't find a release version, which commit would you suggest I try?

Thank you!

membranepotential commented 6 years ago

I now tried again at commit a09178588b7f82a9d0e922ef4fdb3cc466a606dd and get the same error.

Bradsol commented 6 years ago

Ah you are attempting to build an SSBT using SBT filters. If you rename your filters to the extension '.sim.bf.bv' it should work fine. For example DRR008828.sim.bf.bv instead of DRR008828.bf.bv

As a fun fact, the current commit actually has a new build algorithm which I'm in the process of organizing into a release. It's significantly faster to build and produces a better tree but requires constructing minhash sketches of the input bit vectors. I'll see if I can write an updated user manual for it by the weeks end.

membranepotential commented 6 years ago

Oh wow that was much easier to fix than I thought :-)

So would you expect that bt build works with the current commit? I see the bt minhash command in the source, but bt build seems unchanged in main.cc.

Thanks!

Bradsol commented 6 years ago

Yes - the development of the new construction method is orthogonal to the existing build and doesn't use any of the same code. In theory (though I haven't rigorously tested it) the old build should be unchanged.