Martinsos / opal

SIMD C/C++ library for massive optimal sequence alignment (local/SW, infix, overlap, global)
MIT License
35 stars 10 forks source link

Process multiple cells in one iteration, add padding to database sequences, optimize for cache #3

Open Martinsos opened 10 years ago

Martinsos commented 10 years ago

This method is explained in Rognes's article, under "Processing four consecutive cells along the database sequences". They claim that by processing multiple cells they have better usage of SSE registers and caching. Another thing is adding padding to database sequences so they always have the length that is mutliple of 4. That way I can check every 4 columns if some sequence ended instead of doing it every column, which could make things faster.

Martinsos commented 9 years ago

First, I am going to pad database sequences so they have length that is multiple of 4, and I am going to do it for SW. When I have this, I will be able to do householding jobs (checking for overflow and sequence end) every 4 columns instead of every column, and I hope to get some speedup on that.

Later, I can try processing 4 cells in a row at once, for extra speedup.

Martinsos commented 9 years ago

While doing this, I did some learning about cache optimizations, so I also did some work on that. I also fixed few things that I found were not optimal. All together:

Martinsos commented 9 years ago

I also did some measurements of code speed using very primitive method: when I want to see how slow is some part of code, I duplicate it and see for how longer does program execute now: that is how much time this piece of code normally takes. I have to do this carefully, because duplicated code must not change results of execution (because it might affect execution time then) and it also can not be trivial because compiler may optimize it out. Based by those measurements, core loop takes about 70% of execution time, and building query profile takes about 25% of execution time! This is good to know, because I was not sure how much time does query profile building take. TLDR:

Core loop: 70% of time Query profile building: 25% of time