GrammarViz2 / grammarviz2_src

GrammarViz 2.0 public release:
http://grammarviz2.github.io/grammarviz2_site
GNU General Public License v2.0
120 stars 39 forks source link

Large window algorithm problem #15

Closed seninp closed 7 years ago

seninp commented 9 years ago

When a best discord has been found, say length 100, that starts at the position 1000, the interval 900-1100 is marked as visited, since we are looking for non-self matches. But then if the new interval of length 150 is going to be considered for a second discord and happened to be at the position 880, it is in fact overlapping the best discord, but the Large window algorithm will allow it to be considered. This issue need to be resolved.

seninp commented 9 years ago

Commit SHA: 71720a390273df80c190431ae9279a4a76631acb should fix this issue.

trace for HOTSAX: position 307207, NN distance 1953.6773019104255, elapsed time: 0h 2m 33s 911ms, distance calls: 37139944 position 262481, NN distance 1204.4363827118475, elapsed time: 0h 2m 44s 144ms, distance calls: 43544780 position 82020, NN distance 998.3366165777953, elapsed time: 0h 3m 45s 924ms, distance calls: 116213078"

trace old GrammarViz2: position 307281, length 156, NN distance 12.509567344553888, elapsed time: 0h 0m 25s 302ms, distance calls: 9391273 position 307121, length 255, NN distance 8.376467834324288, elapsed time: 0h 0m 21s 783ms, distance calls: 7188725 position 262438, length 159, NN distance 7.507426804227783, elapsed time: 0h 0m 22s 30ms, distance calls: 8236383

trace GrammarViz2 after the fix: position 307281, length 156, NN distance 12.509567344553888, elapsed time: 0h 0m 23s 141ms, distance calls: 8794835 position 262438, length 159, NN distance 7.507426804227783, elapsed time: 0h 0m 22s 967ms, distance calls: 8015379 position 82062, length 153, NN distance 6.277889204320179, elapsed time: 0h 0m 25s 162ms, distance calls: 9281109