This is GSoC2012 fork of 'Mothur'. We are trying to implement a number of 'Feature Selection' algorithms for microbial ecology data and incorporate them into mother's main codebase.
I was profiling the random forest implementation and found a bottleneck which can be possibly improved.
I found the following portion of code, which involves just transposing a matrix, comprises about 14% of the time of a classify.shared command's run:
for (int i = 0; i < numSamples; i++) {
if (m->control_pressed) { break; }
for (int j = 0; j < numFeatures; j++) { bootstrappedFeatureVectors[j][i] = bootstrappedTrainingSamples[i][j]; }
}
It is located at line 45 of the class rftreenode.cpp. The main reason of this bottleneck is, here we are accessing the bootstrappedFeatureVectors in column major order, which involves a lot of cache misses (details can be found here: http://stackoverflow.com/questions/4716125/accessing-elements-of-a-matrix-row-wise-versus-column-wise). May be the class can be designed in a way so that we don't need to transpose the matrix at all? (I've already tried, but couldn't come up with a possible solution).
Here is the call tree I've got by running the following command on mothur:
classify.shared(shared=dataset.shared, design=dataset.design, numtrees=10)
It is generated using valgrind together with kcachegrind.
I was profiling the random forest implementation and found a bottleneck which can be possibly improved.
I found the following portion of code, which involves just transposing a matrix, comprises about 14% of the time of a classify.shared command's run:
It is located at line 45 of the class
rftreenode.cpp
. The main reason of this bottleneck is, here we are accessing thebootstrappedFeatureVectors
in column major order, which involves a lot of cache misses (details can be found here: http://stackoverflow.com/questions/4716125/accessing-elements-of-a-matrix-row-wise-versus-column-wise). May be the class can be designed in a way so that we don't need to transpose the matrix at all? (I've already tried, but couldn't come up with a possible solution). Here is the call tree I've got by running the following command on mothur:classify.shared(shared=dataset.shared, design=dataset.design, numtrees=10)
It is generated using valgrind together with kcachegrind.