Gapminder / ddf-validation

How do you know your DDF dataset is ✅valid?
https://open-numbers.github.io/ddftools.html
GNU General Public License v3.0
7 stars 2 forks source link

heap out of memory for faostat #513

Closed semio closed 6 years ago

semio commented 6 years ago

ddf-validation version: 1.16.4 system version: osx 10.13 dataset: https://github.com/open-numbers/ddf--unfao--faostat

Tested with and without multithread mode, the following exception was raised:

<--- Last few GCs --->

[8803:0x104800000]  1727298 ms: Mark-sweep 1293.1 (1433.8) -> 1293.1 (1433.8) MB, 413.2 / 0.0 ms  allocation failure GC in old space requested
[8803:0x104800000]  1727726 ms: Mark-sweep 1293.1 (1433.8) -> 1293.1 (1425.8) MB, 428.0 / 0.0 ms  last resort GC in old space requested
[8803:0x104800000]  1728022 ms: Mark-sweep 1293.1 (1425.8) -> 1293.1 (1425.8) MB, 295.8 / 0.0 ms  last resort GC in old space requested

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x2f0a380a5529 <JSObject>
    1: /* anonymous */(aka /* anonymous */) [/usr/local/lib/node_modules/ddf-validation/lib/stories/process-one-data-points-chunk.js:~33] [pc=0x2ed7a8b210eb](this=0x2f0a99b022d1 <undefined>,record=0x2f0ae5ffff49 <Object map = 0x2f0ad7b35f79>,line=733)
    2: /* anonymous */ [/usr/local/lib/node_modules/ddf-validation/lib/utils/file.js:~126] [pc=0x2ed7a8b0fb09](this=0x2f0a5d03f321 <ParserStream m...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: node::Abort() [/usr/local/Cellar/node/9.8.0/bin/node]
 2: node::OnFatalError(char const*, char const*) [/usr/local/Cellar/node/9.8.0/bin/node]
 3: v8::Utils::ReportOOMFailure(char const*, bool) [/usr/local/Cellar/node/9.8.0/bin/node]
 4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [/usr/local/Cellar/node/9.8.0/bin/node]
 5: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [/usr/local/Cellar/node/9.8.0/bin/node]
 6: v8::internal::HashTable<v8::internal::StringTable, v8::internal::StringTableShape>::New(v8::internal::Isolate*, int, v8::internal::PretenureFlag, v8::internal::MinimumCapacity) [/usr/local/Cellar/node/9.8.0/bin/node]
 7: v8::internal::HashTable<v8::internal::StringTable, v8::internal::StringTableShape>::EnsureCapacity(v8::internal::Handle<v8::internal::StringTable>, int, v8::internal::PretenureFlag) [/usr/local/Cellar/node/9.8.0/bin/node]
 8: v8::internal::StringTable::LookupKey(v8::internal::Isolate*, v8::internal::StringTableKey*) [/usr/local/Cellar/node/9.8.0/bin/node]
 9: v8::internal::StringTable::LookupString(v8::internal::Isolate*, v8::internal::Handle<v8::internal::String>) [/usr/local/Cellar/node/9.8.0/bin/node]
10: v8::internal::Runtime_KeyedGetProperty(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/Cellar/node/9.8.0/bin/node]
11: 0x2ed7a55042fd
12: 0x2ed7a8b210eb
buchslava commented 6 years ago

@semio please, try this kind of command: validate-ddf --heap 7168

heap - Set custom heap size: 1024 will increase heap to 1gb 2048 will increase heap to 2gb 3072 will increase heap to 3gb 4096 will increase heap to 4gb 5120 will increase heap to 5gb 6144 will increase heap to 6gb 7168 will increase heap to 7gb 8192 will increase heap to 8gb

semio commented 6 years ago

unfortunately I tried to increase to --head 8192 and it's still not working.. It failed on around 45% no matter how I set the heap. I tested on a Linux server with 4 CPU/ 8 GB RAM, failed too. Could you have a try on your side?

buchslava commented 6 years ago

@semio you wrote 'head', but heap is expected... did you tried head?

semio commented 6 years ago

ah that's a typo, sorry. I checked my command line history, it's heap that I was using

buchslava commented 6 years ago

ok, I'll check

buchslava commented 6 years ago

@semio I had the same result, this is an issue, definitely

semio commented 6 years ago

Hi @buchslava

I tried to validate the faostat dataset, and it finished successfully for once, but after that it doesn't work again and raises heap out of memory error similar to my first post. I tested in both my macOS computer (node v10.2.1) and Ubuntu server 16.04 (node v10.3.0), both have 8GB RAM.

But interestingly, I tried to validate harpal's population dataset (https://github.com/harpalshergill/ddf--unpop--wpp_population), which has more files and bigger than the faostat dataset, and it finished successfully.

Can you think of any reason cause this behavior? If you need more input from me please let me know.