ffreemt / deepl-scraper-playwright

Scrape deepl using playwright
MIT License
9 stars 0 forks source link

"FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory" #1

Open ElornMarsk opened 1 year ago

ElornMarsk commented 1 year ago

In general, everything works well, thank you! But sometimes the following error appears:

<--- Last few GCs ---> 0.[3460:000001FC3DF1C280] 6299388 ms: Mark-sweep (reduce) 1022.8 (1038.1) -> 1022.6 (1038.1) MB, 27.1 / 0.1 ms (+ 0.2 ms in 3 steps since start of marking, biggest step 0.1 ms, walltime since start of marking 105 ms) (average mu = 0.723, current mu = 0.81[3460:000001FC3DF1C280] 6299462 ms: Mark-sweep (reduce) 1024.2 (1039.0) -> 1023.5 (1039.0) MB, 23.5 / 0.0 ms (+ 11.2 ms in 8 steps since start of marking, biggest step 10.9 ms, walltime since start of marking 64 ms) (average mu = 0.650, current mu = 0.5

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory 1: 00007FF7BDE7B34F v8::internal::CodeObjectRegistry::~CodeObjectRegistry+123599 2: 00007FF7BDE08CB6 v8::internal::MicrotaskQueue::GetMicrotasksScopeDepth+65206 3: 00007FF7BDE09D8D node::OnFatalError+301 4: 00007FF7BE73C36E v8::Isolate::ReportExternalAllocationLimitReached+94 5: 00007FF7BE72694D v8::SharedArrayBuffer::Externalize+781 6: 00007FF7BE5C9F2C v8::internal::Heap::EphemeronKeyWriteBarrierFromCode+1468 7: 00007FF7BE5C7044 v8::internal::Heap::CollectGarbage+4244 8: 00007FF7BE5C49C0 v8::internal::Heap::AllocateExternalBackingStore+2000 9: 00007FF7BE5E26D0 v8::internal::FreeListManyCached::Reset+1408 10: 00007FF7BE5E2D85 v8::internal::Factory::AllocateRaw+37 11: 00007FF7BE5F864F v8::internal::FactoryBase::NewRawTwoByteString+79 12: 00007FF7BE3DBCED v8::internal::String::SlowFlatten+477 13: 00007FF7BE1467BB v8::internal::WasmTableObject::Fill+603 14: 00007FF7BE746126 v8::String::Utf8Length+22 15: 00007FF7BDE26DB7 v8::internal::Malloced::operator delete+17495 16: 00007FF7BE6F6D46 v8::internal::Builtins::code_handle+172806 17: 00007FF7BE6F6939 v8::internal::Builtins::code_handle+171769 18: 00007FF7BE6F6BFC v8::internal::Builtins::code_handle+172476 19: 00007FF7BE6F6A60 v8::internal::Builtins::code_handle+172064 20: 00007FF7BE7CA141 v8::internal::SetupIsolateDelegate::SetupHeap+494641 21: 000001FC3FC9A75F

ffreemt commented 1 year ago

Beats me... appears to be a javascript related issue, which means a fix is beyond me. Maybe try to set PWBROWSER_HEADFUL=1 in .env (refer to DEBUG section in README.md) to bring out a physical browser, then check out the messages in devtool's console. Hopefully the messages will give some hints.

ElornMarsk commented 1 year ago

I tried to ignore this error with "try:" + "except Exception:", but I didn't succeed. Perhaps I lack knowledge. But maybe it is possible to get around this error with simple methods? So far, restarting does the trick.

ElornMarsk commented 1 year ago

I tried restarting the processes "node.exe" and "firefox.exe" (Nightly) by closing them inside a script. But if the script is a loop, "node.exe" and "firefox.exe" (Nightly) won't run the second time. How can I start these closed processes again in a loop (as if I stop the script and start it again)? Simply put, how to restart the entire script automatically? Most likely this will help to bypass this error.

ffreemt commented 1 year ago

I suspect the headless browser (that uses node and firefox in background I suppose) is having problem. To restart, try to reload the get_pwbrowser_sync and deepl_tr, something along this line:

import importlib

import get_pwbrowser_sync

import deepl_scraper_pw

importlib.reload(get_pwbrowser_sync)

importlib.reload(deepl_scraper_pw)

from deepl_scraper_pw import deepl_tr

In case you set PWBROWSER_HEADFUL=1, this should start a new browser, equivalent to restarting as you talked about.

If possble, can you post the relevant error messages with context (line numbers of the related python files)?

ElornMarsk commented 1 year ago

Unfortunately, it didn't help... Now I've automated restarting the ENTIRE script after this error with a not-so-convenient additional PARALLEL script. Alas, at the moment I do not have enough knowledge for the best solution. It's not as bad a problem as this one: https://github.com/ffreemt/deepl-scraper-playwright/issues/2

DimaDDM commented 1 year ago

Same problem. Maybe should add memory cleaning

Problem appears after several calls to deepl_tr