cbielow / PTXQC

A Quality Control (QC) pipeline for Proteomics (PTX) results generated by MaxQuant
Other
42 stars 25 forks source link

LCSn Does Not Return Longest Substring #43

Closed carpenitoThomas closed 6 years ago

carpenitoThomas commented 6 years ago

I'm trying to use the LCSn function with the following function call:

LCSn(c("AAAAACBBBBB", "AAAAADBBBBB", "AAAABBBBBEF", "AAABBBBBDGH"))

I was expecting the results to be either "BBBBB" or at least "AAA" however what is returned is "".

Thank you!

cbielow commented 6 years ago

good catch. Its a heuristic only, but after giving it some thought, I've improved it to the point where it now reports BBBBB. It can still be fooled by certain inputs though (since its a greedy approach it might overlook candidates and actually end up reporting the empty string). E.g. when the LCS of all inputs is shorter than seemingly good looking initial candidates as in LCSn(c("AAAXXBBB", "BBBXXDDD", "XXAAADDD")) ## --> fails due to greedy approach; should be "XX"

Hope this helps.

cbielow commented 6 years ago

see 534c6e4d45740fa6ddd008d52969eac95ff3380f