bcgsc / btl_bloomfilter

The BTL C/C++ Common bloom filters for bioinformatics projects, as well as any APIs created for other programming languages.
GNU General Public License v3.0
18 stars 4 forks source link

Template Parameters for bloomfilter #29

Closed jwcodee closed 5 years ago

jwcodee commented 5 years ago

To extend btl_bloomfilter to work with konnector, I have come up with the following implementation based on suggestion from Shaun.

[jowong@jowong05 github]$ cat std_hash_test2.cpp 
#include <iostream>

int main()
{
    std::hash<uint64_t> hash_f;
    std::cout << hash_f(0) << std::endl;
    std::cout << hash_f(1) << std::endl;
    std::cout << hash_f(2) << std::endl;
    std::cout << hash_f(3) << std::endl;
}
[jowong@jowong05 github]$ clang++ std_hash_test2.cpp -std=c++11
[jowong@jowong05 github]$ ./a.out 
0
1
2
3
[jowong@jowong05 github]$ g++ std_hash_test2.cpp -std=c++11
[jowong@jowong05 github]$ ./a.out 
0
1
2
3

Based on this std::hash<uint64_t> does appear to be the identity function

sjackman commented 5 years ago

Try a second pass at this PR. Leave the existing functions as is. The insert(uint64_t precomputed[]) functions take a list of hash values, which are generated from a single key object. Instead of modifying the existing API, which should remain as is, for users that want to pass in hash values, you should add a new API that's instead based on key objects. Although it may be possibly to name that function insert, to avoid confusion, you may want to instead call it insertKey.

jwcodee commented 5 years ago

Note: Default BloomFilter class needs instantiated as BloomFilter<>. I believe with C++17, it is possible to do BloomFilter if you are using default template arguments

sjackman commented 5 years ago

If it's not too late to change the API of this library, I'd consider renaming the existing insert(uint64_t[]) function to insertHash, and add a new insert(const Key& key) function that calls insert(uint64_t[]).

jwcodee commented 5 years ago

If it's not too late, I'd consider renaming the existing insert(uint64_t[]) function to insertHash, and add a new insert(const Key& key) function.

I'm fairly certain that would break a lot of things. What is wrong with just adding a new insert(const Key& key) function without renaming? Im guessing it would be a problem if the user puts uint64_t[] as a key

JustinChu commented 5 years ago

Consider looking into the idea behind KmerBloomFilter.hpp. This class inherts from the default bloom filter, we didn't want to couple the standard bloom filter to a type. You could do the same, and add overloaded functions for konnector.

Edit: this class was designed solely for the perl swig interface, but you could make one similar which expects a template parameter.

jwcodee commented 5 years ago

Consider looking into the idea behind KmerBloomFilter.hpp. This class inherts from the default bloom filter, we didn't want to couple the standard bloom filter to a type. You could do the same, and add overloaded functions for konnector.

The reason why we ended up with type is so that we don't have to import the class Kmer and citihash into this repo.

jwcodee commented 5 years ago

@sjackman , @JustinChu and I talked a bit regarding the new implementation and we agree that we can instead put the new implementation in an inherited inherited class. That way we can avoid any changes to BloomFilter.hpp and users won't be affected either.

sjackman commented 5 years ago

Sure. Works for me. Maybe call it BloomFilterTemplate?