Bioconductor / pwalign

Perform pairwise sequence alignments
0 stars 0 forks source link

Is there a way to obtain the trace back matrix in align_pairwiseAlignment.c? #8

Open peteryzheng opened 3 months ago

peteryzheng commented 3 months ago

Hi,

TL;DR: Would it be possible to have pairwiseAlignment return the traceback matrix?

I am interested in doing global-local alignments between pattern string A and a set of subject strings B, C, D, etc. In this case, the subject strings are longer than the pattern string. I hope to find within the subject strings those that have significantly better alignment scores than we would expect by chance.

Furthermore, I want to adjust for how far from the start of the subject string where a good alignment is found. This is biologically important in my use case. The reason is that since my pattern strings tend to be short, for instance, if we try to find a 6-mer in a set of 100-mers, chances are we are going to find some hits. However, if we find a 6-mer right at the start of a certain 100-mer, that would indicate something significant biologically in our problem.

In order to control for this, I am doing iteratively global-local alignments using pairwiseAlignment between pattern string A and all substrings of subject strings that start from the beginning. [for instance, from substring(B, 1, nchar(A)) to substring(B, 1, nchar(B))]. Then we use those alignment scores that are controlled for distance from the starting point to assess significance. A major problem with this current approach is that we have to run the underlying alignment algorithm many many times for each substring.

However, If we were to have access to the traceback matrix currTraceMatrix in align_pairwiseAlignment.c from the pairwiseAlignment call, that would save me a tremendous amount of time.

Would this be possible? I would imagine some folks in my field would be interested in having this feature.

Thank you in advance.

Best, Peter

hpages commented 3 months ago

Hi Peter,

I just moved this issue to the pwalign repository. Please note that starting with Bioconductor 3.19 (to be released in about a month), pairwiseAlignment() and all related functionalities currently found in Biostrings will be in the new pwalign package.

To answer your question: we have limited resources at the moment to implement the feature that you are requesting but we would welcome a PR.

Best, H.