DIGI-VUB / text.alignment

Text Alignment with Smith-Waterman
MIT License
10 stars 1 forks source link

repeating words at the beginning #2

Open agricolamz opened 4 months ago

agricolamz commented 4 months ago

Hi, thank you for your package! I found strange behavior when there are repeating units at the beginning of the string:

library(text.alignment)
smith_waterman("a b c d", "a b c d", type = "words") # expected
#> Swith Waterman local alignment score: 8
#> ----------
#> Document a
#> ----------
#> a b c d
#> ----------
#> Document b
#> ----------
#> a b c d
smith_waterman("a a b c d", "a a b c d", type = "words") # not expected
#> Swith Waterman local alignment score: 10
#> ----------
#> Document a
#> ----------
#> # a b c d
#> ----------
#> Document b
#> ----------
#> a a b c d
smith_waterman("a a a b c d", "a a a b c d", type = "words") # not expected
#> Swith Waterman local alignment score: 12
#> ----------
#> Document a
#> ----------
#> # a # a b c d
#> ----------
#> Document b
#> ----------
#> a # a a b c d

If I put repetitions in any other place of the string other then beginning, everything works fine as I expected...

Linux
R 4.4.1
text.alignment v 0.1.4
jwijffels commented 4 months ago

Thanks for the report, indeed strange, wonder how the alignment matrix looks like, looks like the gap score was he same as the alignment score