edawson / rkmh

Classify sequencing reads using MinHash.
MIT License
48 stars 4 forks source link

Threading is bad #1

Closed edawson closed 8 years ago

edawson commented 8 years ago

Threads seem to be spending a lot of time waiting/instantiating, rather than doing real work. This is probably just a matter of moving around the OpenMP calls and moving from a map<string, hashes> to a vector<pair<string, hashes> > to allow parallel comparison of reads and refs.

edawson commented 8 years ago

In this vein: c++ string operations are slow. I think moving to char* should provide a decent boost at the expense of some loss of clarity.

ekg commented 8 years ago

You can always use char* to the existing stings as needed. Which operations are slower?

On Wed, Jul 6, 2016, 20:13 Eric T. Dawson notifications@github.com wrote:

In this vein: c++ string operations are slow. I think moving to char* should provide a decent boost at the expense of some loss of clarity.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/edawson/rkmh/issues/1#issuecomment-230874805, or mute the thread https://github.com/notifications/unsubscribe/AAI4EcOsTxalucCpibPfjM485dp_wFylks5qS_5cgaJpZM4JBFaV .

edawson commented 8 years ago

I use 'std::string::substr' to make the kmers and pass the newly-generated vector of strings to the hash function. I should probably just take slices of the char* pointing to the input sequence, which will save like 7000 string instantiations per read.

edawson commented 8 years ago

Threading is fixed since I switched from std::map to std::vector for most data structures and put in omp critical sections where absolutely necessary. While it's not perfect, there are no more segfaults and it's fast enough. Closing this.