bcgsc / btl_bloomfilter

The BTL C/C++ Common bloom filters for bioinformatics projects, as well as any APIs created for other programming languages.
GNU General Public License v3.0
18 stars 4 forks source link

SWIG - Problems with Passing by Reference #2

Closed JustinChu closed 8 years ago

JustinChu commented 8 years ago

There are a number of functions for keeping track of hash value status in the forward and reverse for rolling, requring us to pass some values by reference and mutate the value.

These do not seem to be easily fixed to work with SWIG. I also think that the user should not directly have to store these temporary varibles for the current rolling value.

Furthermore, there seems to be some indication that our current design may not be optimal or easy to understand even in our C++ projects. In fact @benvvalk has created rolling hash iterators that return a vector of values to be inserted, rather than use the existing framework. For me, I have been finding the current interface hard to work with (despite having designed a lot of it).

I propose a redesign, where we keep only functions that either intesrt a single k-mer ones (i.e. single variable or one off) or a precomputed vector of hash values generated from external functions or objects (like the rolling hash iterator). Basically, the role and perhaps some functionality of the iterators may need to be expanded, and some base functions in our Bloom filter may need to be removed.

It is better that this change occur sooner rather than later if you agree on the changes proposed.

JustinChu commented 8 years ago

I removed some functions but retained basic insert and query. All rolling must be done with an iterator now, meaning we have to include the iterator in the perl swig module if we want to use the rolling hash.

benvvalk commented 8 years ago

I agree completely about the interface to the BloomFilter class being too complex. It would be very hard for an outsider to read that source code and figure out how to use it (e.g. Sara).

I also agree that is a good idea to remove those rolling hash methods and separate the rolling hash functionality from the Bloom filter functionality. As per the principle of "loose coupling", we should avoid making the different components of our software depend on each other for their existence (when possible). IMO, there is nothing wrong with a Bloom filter class that is "just a bit vector".