Closed antmarakis closed 4 years ago
Hi @antmarakis, thanks for your message.
Since max-length LCP is given by a debug option in egsa, I don't think it is a good idea adding this feature here.
But, you could use gsufsort tool: https://github.com/felipelouza/gsufsort. I just created the option --lcp_max_text
, which gives you the max-length LCP text.
Example:
/gsufsort dataset/input.txt --lcp --lcp_max --lcp_max_text
dataset/input.txt.4.lcp 20199320 bytes (n = 5049830)
LCP max: 71
Longest LCP: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Note that gsufsort tool is not external memory. It needs 9n bytes to compute SA and LCP array.
Please, let me know if you need any help.
Hi @felipelouza, that solved my problem, thank you!
I am wondering, do you know how I could go about computing the LCS from a Suffix Array? Would your gsufsort library be useful in this regard? I didn't have much time to take a proper look yet, I am afraid.
Thanks!
Hi again,
Yes, it is easy to compute the LCP between two strings T_1 and T_2, given SA, LCP and DA. Have a look at Section 5.4.2 of Ohlebusch's book: https://www.uni-ulm.de/in/theo/m/ohlebusch/book-bioinformatics-algorithms/
I think you can do that using simply the output (.sa, .lcp and .da) of gsufsort.
Sorry I cannot do this for you now, but let me know if you have problems.
Cheers!
I am going to check the book out, thanks for the suggestion! I think I should be able to figure it out, cheers!
Hi!
This is an excellent library, thanks for posting!
I am wondering, is there a way to get the actual max-length LCP text? When I ran the code I can see the maximum length, but not the actual LCP. Is there a way to get that done?
I am afraid I do not have much experience with advanced C/C++, so any help is greatly appreciated! Thanks!