Jason3S / cSpell-Tools

Tools used to assist cSpell Development
MIT License
1 stars 1 forks source link

Out of memory occurs when compiling a very large dictionary file #2

Open uyu423 opened 5 years ago

uyu423 commented 5 years ago

I tried to compile to make the Korean trie.gz file, but out of memory occurred in all situations.

The number of lines in the dic file is 102252, and the number of lines in the aff file is 122134.

I tried to increase memory usage with --max_old_space_size option but it only increased runtime and still out of memory.

node --max_old_space_size=32000 ./node_modules/.bin/cspell-tools compile-trie ./ko-aff-dic-0.7.1/ko.dic
Compile:
 output: default
 compress: true
 files:
  ./ko-aff-dic-0.7.1/ko.dic

Process "./ko-aff-dic-0.7.1/ko.dic" to "ko-aff-dic-0.7.1/ko.trie.gz"

<--- Last few GCs --->

[60298:0x104000000]   822651 ms: Mark-sweep 31996.2 (32543.8) -> 31995.9 (32544.3) MB, 59351.9 / 0.0 ms  (+ 0.1 ms in 27 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 59359 ms) (average mu = 0.120, current mu = 0.002) [60298:0x104000000]   890885 ms: Mark-sweep 31997.6 (32544.3) -> 31997.5 (32545.8) MB, 68226.9 / 0.1 ms  (+ 0.0 ms in 26 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 68233 ms) (average mu = 0.059, current mu = 0.000)

<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x9332f2dbe3d]
Security context: 0x300afab9e6e1 <JSObject>
    1: DoJoin(aka DoJoin) [0x300afab85e89] [native array.js:~87] [pc=0x9332f2e633a](this=0x300a638026f1 <undefined>,l=0x300aadd4b2b9 <JSArray[2]>,m=2,A=0x300a638028c9 <true>,w=0x300a63809d61 <String[1]:  >,v=0x300a638029a1 <false>)
    2: Join(aka Join) [0x300afab85ed9] [native array.js:~112] [pc=0x9332f6a0478](this=0x300a638026f1 <undefined>,l=0x300aadd4b2b9...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10003ae75 node::Abort() [/Users/yowu/.nvs/default/bin/node]
 2: 0x10003b07f node::OnFatalError(char const*, char const*) [/Users/yowu/.nvs/default/bin/node]
 3: 0x1001a7ae5 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/Users/yowu/.nvs/default/bin/node]
 4: 0x100572ef2 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/Users/yowu/.nvs/default/bin/node]
 5: 0x1005759c5 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/Users/yowu/.nvs/default/bin/node]
 6: 0x10057186f v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/Users/yowu/.nvs/default/bin/node]
 7: 0x10056fa44 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/Users/yowu/.nvs/default/bin/node]
 8: 0x10057c2dc v8::internal::Heap::AllocateRawWithLigthRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/Users/yowu/.nvs/default/bin/node]
 9: 0x10057c35f v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/Users/yowu/.nvs/default/bin/node]
10: 0x10054e1e4 v8::internal::Factory::NewRawTwoByteString(int, v8::internal::PretenureFlag) [/Users/yowu/.nvs/default/bin/node]
11: 0x10082784d v8::internal::Runtime_StringBuilderJoin(int, v8::internal::Object**, v8::internal::Isolate*) [/Users/yowu/.nvs/default/bin/node]
12: 0x9332f2dbe3d
[1]    60298 abort      node --max_old_space_size=32000 ./node_modules/.bin/cspell-tools compile-trie

If the time complexity of calculating the dic file and the aff file is a problem, let me know which code is causing the problem and I would like to help you improve it.

I tried to compile the en_US hunspell as a test. It was compiled in a very short time.

The used hunspell project is https://github.com/spellcheck-ko/hunspell-dict-ko/releases

I would like to help many Korean developers who use code spells. Help me. Thank you.

uyu423 commented 5 years ago

I set the value of --max_old_space_size to 70000 and compiled it for 80 minutes, but I still got out of memory. sad.

$ node --max_old_space_size=70000 ./node_modules/.bin/cspell-tools compile-trie ./ko-aff-dic-0.7.1/ko.dic
Compile:
 output: default
 compress: true
 files:
  ./ko-aff-dic-0.7.1/ko.dic

Process "./ko-aff-dic-0.7.1/ko.dic" to "ko-aff-dic-0.7.1/ko.trie.gz"

<--- Last few GCs --->

[61972:0x103800c00]  2429772 ms: Mark-sweep 69892.7 (71183.8) -> 69892.7 (71185.3) MB, 132854.4 / 0.1 ms  (average mu = 0.099, current mu = 0.000) allocation failure scavenge might not succeed
[61972:0x103800c00]  2699584 ms: Mark-sweep 69894.1 (71185.3) -> 69894.0 (71186.8) MB, 269797.7 / 0.0 ms  (average mu = 0.035, current mu = 0.000) allocation failure scavenge might not succeed

<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x36f4c45be3d]
Security context: 0x2a7707d1e6e1 <JSObject>
    1: DoJoin(aka DoJoin) [0x2a7707d05e89] [native array.js:~87] [pc=0x36f4c4670fa](this=0x2a77155026f1 <undefined>,l=0x2a84e9088039 <JSArray[2]>,m=2,A=0x2a77155028c9 <true>,w=0x2a7715509d61 <String[1]:  >,v=0x2a77155029a1 <false>)
    2: Join(aka Join) [0x2a7707d05ed9] [native array.js:~112] [pc=0x36f4c81d758](this=0x2a77155026f1 <undefined>,l=0x2a84e9088039...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10003ae75 node::Abort() [/Users/yowu/.nvs/default/bin/node]
 2: 0x10003b07f node::OnFatalError(char const*, char const*) [/Users/yowu/.nvs/default/bin/node]
 3: 0x1001a7ae5 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/Users/yowu/.nvs/default/bin/node]
 4: 0x100572ef2 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/Users/yowu/.nvs/default/bin/node]
 5: 0x1005759c5 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/Users/yowu/.nvs/default/bin/node]
 6: 0x10057186f v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/Users/yowu/.nvs/default/bin/node]
 7: 0x10056fa44 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/Users/yowu/.nvs/default/bin/node]
 8: 0x10057c2dc v8::internal::Heap::AllocateRawWithLigthRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/Users/yowu/.nvs/default/bin/node]
 9: 0x10057c35f v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/Users/yowu/.nvs/default/bin/node]
10: 0x10054e1e4 v8::internal::Factory::NewRawTwoByteString(int, v8::internal::PretenureFlag) [/Users/yowu/.nvs/default/bin/node]
11: 0x10082784d v8::internal::Runtime_StringBuilderJoin(int, v8::internal::Object**, v8::internal::Isolate*) [/Users/yowu/.nvs/default/bin/node]
12: 0x36f4c45be3d
[1]    61972 abort      node --max_old_space_size=70000 ./node_modules/.bin/cspell-tools compile-trie
uyu423 commented 5 years ago

Is there a way to use the hunspell dic and aff files in cspell without compiling?

Jason3S commented 5 years ago

Sorry about not responding. There is a known issue for very large word lists. If you have another way to convert a hunspell file into a list of words, it might work better.