Closed recursion-ninja closed 4 years ago
These commits: https://github.com/amnh/PCG/commit/f694a90cdf323d2eb8ebc36e73cc529893ba70a5 and https://github.com/amnh/PCG/commit/9c6d011ccf901a944ebf9e5b8b06d7ac267fa570
add unboxed instances for the direction and cost types and so lay the groundwork for rewriting these functions making use of Unboxed Vectors or the arrays in the repa
library (which also uses unboxed vectors to for the "manifest" array representation).
We should do our best to make the entire benchmarking suite run in parallel. This will allow us to run the benchmarks on the cluster, utilizing the 100s of cores we have available.
I have added the string alignment benchmarking suite. See a0550f669dcfcfc4032fd2a02526881114386217.
After adding INLINE
and SPECIALISE
pragmas where applicable, there was a a roughly 12.8% reduction in runtime on the string alignment functions.
Benchmarking shows that for large alphabet inputs, the majority of time is spent on string alignment. This is to be expected. However the current implementation of the naive and Ukkonen Needleman-Wunsch string alignment algorithms suffer from not being able to in line code between modules.
We should add
INLINEABLE
andSPECIALISE
pragmas to the low-level string alignment, top-level functions. This will likely net us some "free" performance gains.In order to quantify any gains from said methods, we need to have a benchmarking suite in place. This should not be too difficult to construct, fasta/c files with two taxa sequences would suffice as inputs for the benchmarks.
Benchmarking would also be useful to provide some empirical data for determining at what limit beneath which to always use the naive needleman wunsch algorithm, and above which to attempt to use Ukkonen's space and time saving method.
More aggressive optimizations than the "free" methods can be implemented after the benchmarks are in place to quantify the gains.