cfinke / Typo.js

A client-side JavaScript spellchecker that uses Hunspell-style dictionaries.
Other
501 stars 110 forks source link

High CPU and memory usage with long misspelled words #56

Open jhuckaby opened 7 years ago

jhuckaby commented 7 years ago

Hello, and thank you for developing this module. We need more pure JavaScript solutions like this!

I noticed that when trying to lookup suggestions for long words, the library seems to chug resources and lag pretty hard. The word djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf takes over 7 seconds to process using typo-js, and the Node process ends up eating over 800 MB of RAM.

Example code:

var Typo = require("typo-js");
var dictionary = new Typo( "en_US" );

var time_start = (new Date()).getTime();
var word = 'djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf';

var correct = dictionary.check(word);

if (!correct) {
    console.log( word + " is NOT spelled correctly." );
    var suggestions = dictionary.suggest(word);
    console.log( "Suggestions: " + JSON.stringify(suggestions) );
}
else {
    console.log( word + " is spelled correctly." );
}

var elapsed = Math.floor( (new Date()).getTime() - time_start );
var mem = process.memoryUsage();

console.log( elapsed + "ms elapsed, " + mem.rss + " bytes used" );

Output:

djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf is NOT spelled correctly.
Suggestions: []
7743ms elapsed, 857202688 bytes used

This is on a late 2016 MacBook Pro (2.9 GHz Intel Core i7) with OS X 10.12.3 and Node.js v6.9.1.

cfinke commented 7 years ago

Thanks for the report; I've confirmed the same buggy behavior on my own machine.

Not only does it take longer to retrieve suggestions for longer words, it takes exponentially longer based on the number of letters in the misspelled word:

5 letters: 22 ms per letter
6 letters: 17 ms per letter
7 letters: 19 ms per letter
8 letters: 23 ms per letter
9 letters: 24 ms per letter
10 letters: 27 ms per letter
11 letters: 30 ms per letter
12 letters: 49 ms per letter
13 letters: 49 ms per letter
14 letters: 51 ms per letter
15 letters: 63 ms per letter
16 letters: 59 ms per letter
17 letters: 62 ms per letter
18 letters: 76 ms per letter
19 letters: 72 ms per letter
20 letters: 75 ms per letter
21 letters: 86 ms per letter
22 letters: 110 ms per letter
23 letters: 179 ms per letter
24 letters: 201 ms per letter
25 letters: 210 ms per letter
26 letters: 196 ms per letter
27 letters: 114 ms per letter
28 letters: 149 ms per letter
29 letters: 243 ms per letter
30 letters: 286 ms per letter
31 letters: 218 ms per letter
32 letters: 183 ms per letter
33 letters: 268 ms per letter
34 letters: 324 ms per letter
35 letters: 333 ms per letter
36 letters: 239 ms per letter
37 letters: 352 ms per letter
38 letters: 354 ms per letter
39 letters: 366 ms per letter
cfinke commented 7 years ago

bf580beae00cc5437c1f7c008f75f35730fe1a27 helps by not saving edit-2-distance possible suggestions in memory unless they're actual dictionary words.

jhuckaby commented 7 years ago

Still takes about 7 seconds to lookup suggestions for djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf on my machine, but you have definitely reduced the memory footprint. It's down to 200MB. Nice job!

djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf is NOT spelled correctly.
Suggestions: []
6867ms elapsed, 211562496 bytes used

I wonder if there is any way to do a faster rejection of long word suggestions, because there are so few of them in the dictionary. Like index them all by character length, so you only have to compare a word against other words of similar length.

cfinke commented 7 years ago

4aa816245c247a36d67826c574a1ba566dec498f improves memory usage further, reducing the usage for the djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf by about 15% on my machine. Lookup time is still in the 6-7 second range for me.

jhuckaby commented 7 years ago

Yup, down to about 166 MB RAM now, but still takes 7 seconds. Good progress tho!

djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf is NOT spelled correctly.
Suggestions: []
7065ms elapsed, 166289408 bytes used
blurymind commented 5 years ago

sorry guys, had to move to nspell, as this bug was causing noticeable lag here: https://github.com/InfiniteAmmoInc/Yarn/pull/80

nspell doesn't have the delay when suggesting difficult words. Will consider going back to typo.js if this is fixed 👍

My suggestion here is to have a look at how nspell does it and copy the approach

The issue for me is not the cpu/memory, as much as the HUGE delay (over 7 seconds looks like a bug to the user)

FeepingCreature commented 5 years ago

You guys might want to grab my changes from over at hunspell-spellchecker.

Lookup for alternatives is a lot faster if you pre-build a dictionary tree.