amnh / PCG

๐™‹๐™๐™ฎ๐™ก๐™ค๐™œ๐™š๐™ฃ๐™š๐™ฉ๐™ž๐™˜ ๐˜พ๐™ค๐™ข๐™ฅ๐™ค๐™ฃ๐™š๐™ฃ๐™ฉ ๐™‚๐™ง๐™–๐™ฅ๐™ โธบ Haskell program and libraries for general phylogenetic graph search
28 stars 1 forks source link

Add a benchmarking suite for string alignment operations #141

Closed recursion-ninja closed 4 years ago

recursion-ninja commented 5 years ago

Benchmarking shows that for large alphabet inputs, the majority of time is spent on string alignment. This is to be expected. However the current implementation of the naive and Ukkonen Needleman-Wunsch string alignment algorithms suffer from not being able to in line code between modules.

We should add INLINEABLE and SPECIALISE pragmas to the low-level string alignment, top-level functions. This will likely net us some "free" performance gains.

In order to quantify any gains from said methods, we need to have a benchmarking suite in place. This should not be too difficult to construct, fasta/c files with two taxa sequences would suffice as inputs for the benchmarks.

Benchmarking would also be useful to provide some empirical data for determining at what limit beneath which to always use the naive needleman wunsch algorithm, and above which to attempt to use Ukkonen's space and time saving method.

More aggressive optimizations than the "free" methods can be implemented after the benchmarks are in place to quantify the gains.

Boarders commented 4 years ago

These commits: https://github.com/amnh/PCG/commit/f694a90cdf323d2eb8ebc36e73cc529893ba70a5 and https://github.com/amnh/PCG/commit/9c6d011ccf901a944ebf9e5b8b06d7ac267fa570 add unboxed instances for the direction and cost types and so lay the groundwork for rewriting these functions making use of Unboxed Vectors or the arrays in the repa library (which also uses unboxed vectors to for the "manifest" array representation).

recursion-ninja commented 4 years ago

We should do our best to make the entire benchmarking suite run in parallel. This will allow us to run the benchmarks on the cluster, utilizing the 100s of cores we have available.

recursion-ninja commented 4 years ago

I have added the string alignment benchmarking suite. See a0550f669dcfcfc4032fd2a02526881114386217.

After adding INLINE and SPECIALISE pragmas where applicable, there was a a roughly 12.8% reduction in runtime on the string alignment functions.