Closed buddha314 closed 5 years ago
Someone who reviewed the code in question (@ben-albrecht, @benharsh?) should correct me if I'm wrong, but I think the implementation in that test is exponential in nature (i.e., will start to take forever quickly as string sizes grow) and that we'd probably want to see a dynamic programming implementation before putting it into a blessed library.
blessed? So mote it be... Nice theming!
@bradcray - That sounds right to me. It probably wouldn't be too hard to write a dynamic programming implementation.
For anyone interested in this good first issue, here's a Chapel implementation derived from a python implementation to start:
proc lcs(a: string, b: string): string {
var lengths: [1..a.size+1, 1..b.size+1] int;
for (i, x) in zip(1..a.size, a) {
for (j, y) in zip(1..b.size, b) {
if x == y {
lengths[i+1, j+1] = lengths[i, j] + 1;
} else {
lengths[i+1, j+1] = max(lengths[i+1, j], lengths[i, j+1]);
}
}
}
var result = '';
var x = a.size+1,
y = b.size+1;
while x != 1 && y != 1 {
if lengths[x, y] == lengths[x-1, y] then x -= 1;
else if lengths[x, y] == lengths[x, y-1] then y -= 1;
else {
assert(a[x-1] == b[y-1]);
result = a[x-1] + result;
x -= 1;
y -= 1;
}
}
return result;
}
// Tests
writeln(lcs('1234', '1224533324')); // 1234
writeln(lcs('thisisatest', 'testing123testing')); // tsitest
I'm not sure what standard module this would fit into - maybe it ought to live in a mason package to begin with.
I wrote something like this a long time ago. test/trivial/diten/lcs/lcs.chpl. It could probably use some cleaning up and modernization if we wanted to make it an official thing.
I tried writing code for common subsequence whose algorithm looks similar to @ben-albrecht . Kindly take a look at it .
`proc input_string(out str1: string, out str2: string) { var infile=open(filename, iomode.r).reader();
infile.read(str1); infile.read(str2); infile.close(); }
proc lcs ( str1:string,str2:string,len1,len2) { var L:[0..len1,0..len2] int;
for i in 0..len1
{
for j in 0..len2
{
if i==0 or j==0
{
L[i,j]=0;
}
elif str1[i-1]==str2[j-1]
{
L[i,j]=L[i-1,j-1]+1 ;
}
else
{
L[i][j] = max(L[i-1][j], L[i][j-1])
}
}
}
var index : int = L[len1][len2];
var lcs[index+1] : string;
lcs[index] = '';
var m:int = len1;
var n:int = len2;
while m > 0 and n > 0
{
if str1[m-1] == str2[n-1])
{
lcs[index-1] = str1[i-1];
m-=1; n-=1; index-=1;
}
elif (L[m-1][n] > L[m][n-1])
m-=1;
else
n-=1;
}
return lcs;
}
proc main() {
config var filename="commonsubs";
var str1,str2 : string ;
var len1,len2 : int ;
input_string(str1,str2);
len1= str1.size;
len2= str2.size;
writeln(lcs(str1,str2,len1,len2));
}`
Hi @ankugupt - thanks for your interest. I expect this issue to be closed by the addition of the StringUtils package to the mason registry.
If you think your implementation is an improvement over that one, I encourage you to contribute to the StringUtils package: https://github.com/AnubhavUjjawal/Chapel-StringUtils
Following my habit of using issues as feature requests.
I came across this commit for longest common subsequence and I would love to see this pushed into the string library or other basic text processing library with some of the functionality from python's difflib specifically
SequenceMatcher()