getappmap / appmap-js

Client libraries for AppMap
48 stars 17 forks source link

Indexer crashes: FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory #886

Open symwell opened 1 year ago

symwell commented 1 year ago

The indexer crashes with an out of memory error. Also described here.

<--- Last few GCs --->

[821998:0x5b21020]   521524 ms: Mark-sweep 2023.1 (2089.5) -> 2011.8 (2094.0) MB, 1277.7 / 0.0 ms  (average mu = 0.383, current mu = 0.035) allocation failure; scavenge might not succeed
[821998:0x5b21020]   523403 ms: Mark-sweep 2027.7 (2094.0) -> 2016.5 (2098.5) MB, 1840.4 / 0.0 ms  (average mu = 0.204, current mu = 0.021) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb6e260 node::Abort() [/home/test/.nvm/versions/node/v18.8.0/bin/node]
 2: 0xa7cdea  [/home/test/.nvm/versions/node/v18.8.0/bin/node]
 3: 0xd45f00 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/home/test/.nvm/versions/node/v18.8.0/bin/node]
 4: 0xd462a7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/home/test/.nvm/versions/node/v18.8.0/bin/node]
 5: 0xf235b5  [/home/test/.nvm/versions/node/v18.8.0/bin/node]
 6: 0xf35a9d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/home/test/.nvm/versions/node/v18.8.0/bin/node]
 7: 0xf1019e v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/test/.nvm/versions/node/v18.8.0/bin/node]
 8: 0xf11567 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/test/.nvm/versions/node/v18.8.0/bin/node]
 9: 0xef273a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/home/test/.nvm/versions/node/v18.8.0/bin/node]
10: 0x12b5c4f v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/home/test/.nvm/versions/node/v18.8.0/bin/node]
11: 0x16e78b9  [/home/test/.nvm/versions/node/v18.8.0/bin/node]

To reproduce

# in one terminal start the indexer
packages/cli$ yarn start index --watch --appmap-dir ~/src/sample_app_6th_ed/tmp/appmap/minitest

# in another terminal keep creating and deleting appmaps for a little over 5 minutes.
~/src/sample_app_6th_ed$ rm tmp/appmap/minitest/*; APPMAP=true DISABLE_SPRING=true bundle exec rails test; sleep 60; rm tmp/appmap/minitest/*; APPMAP=true DISABLE_SPRING=true bundle exec rails test; sleep 60; rm tmp/appmap/minitest/*; APPMAP=true DISABLE_SPRING=true bundle exec rails test; sleep 60; rm tmp/appmap/minitest/*; APPMAP=true DISABLE_SPRING=true bundle exec rails test; sleep 60; rm tmp/appmap/minitest/*; APPMAP=true DISABLE_SPRING=true bundle exec rails test; sleep 60; rm tmp/appmap/minitest/*; APPMAP=true DISABLE_SPRING=true bundle exec rails test; sleep 60; rm tmp/appmap/minitest/*; APPMAP=true DISABLE_SPRING=true bundle exec rails test; sleep 60; rm tmp/appmap/minitest/*; APPMAP=true DISABLE_SPRING=true bundle exec rails test; sleep 60; rm tmp/appmap/minitest/*; APPMAP=true DISABLE_SPRING=true bundle exec rails test; sleep 60; rm tmp/appmap/minitest/*; APPMAP=true DISABLE_SPRING=true bundle exec rails test; sleep 60;

Also reproduced with sleep 30 instead of sleep 60.

kgilpin suggests

The first thing I would check is that the indexer uses a single queue
across all the create/update events. It's probably running a separate queue
for each event and so with many create/update events, it can eventually run
out of memory with all that AppMap data concurrently loaded.

Same for the scanner.
symwell commented 1 year ago

Some progress:

1) Before, we could end up with more files in pending than there were in the queue

num in pending      67
num in queue        65

Now, after changing how items are added (check if in pending when the queue item is processed, not pushed), we end up with few files in pending but queue never draining.

num in pending      2
num in queue        269

It's strange QueueObject doesn't have a pop.

2) Later I removed the pending variable, still got OOM.

3) Set MaxMSBetween in EventAggregator to a high value of 10 * 1000 * 10000000 (112 days), got OOM.

4) I saw the queue go up to

num in queue        381

even though there are only 67 appmap.json files

find $HOME/src/sample_app_6th_ed/tmp/appmap/minitest | grep -i appmap.json | wc -l
67

This suggests the queue might not be having items removed from it.

5) After adding the ability to remove items from the queue, it takes some time for the queue to empty. Indexing takes a lot of time to complete, anywhere from 50ms-150ms-3500ms-37000ms per file.

queue processed     /home/test/src/sample_app_6th_ed/tmp/appmap/minitest/Microposts_controller_should_redirect_destroy_for_wrong_micropost.appmap.json in 19168
queue processed     /home/test/src/sample_app_6th_ed/tmp/appmap/minitest/Microposts_controller_should_redirect_destroy_when_not_logged_in.appmap.json in 37423
queue processed     /home/test/src/sample_app_6th_ed/tmp/appmap/minitest/Relationships_controller_destroy_should_require_logged-in_user.appmap.json in 18835
queue processed     /home/test/src/sample_app_6th_ed/tmp/appmap/minitest/Site_layout_layout_links.appmap.json in 18753
queue processed     /home/test/src/sample_app_6th_ed/tmp/appmap/minitest/Following_followers_page.appmap.json in 31041
kgilpin commented 1 year ago

Item completion may not being reported properly.

Ask Dustin about it?

On Mon, Dec 5, 2022 at 2:51 PM symwell @.***> wrote:

Some progress

Before, we could end up with more files in pending than there were in the queue

num in pending 67 num in queue 65

Now, after changing how items are added, we end up with few files in pending but queue never draining

num in pending 2 num in queue 269

It's strange QueueObject doesn't have a pop.

— Reply to this email directly, view it on GitHub https://github.com/getappmap/appmap-js/issues/886#issuecomment-1338075710, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAVC62HFXFQEMTDVHRNIX3WLZBURANCNFSM6AAAAAASUSEMX4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

symwell commented 1 year ago

I created a possible fix that keeps the number of items in the queue to no more than 67. It's here.

It still produces an OOM.

Perhaps the issue is some other resource leaking memory, like open file handles that don't close or something.

kgilpin commented 1 year ago

Items in the queue are just file names. Should not cause OOM

On Tue, Dec 6, 2022 at 12:25 PM symwell @.***> wrote:

I created a possible fix that keeps the number of items in the queue to no more than 67. It's here https://github.com/getappmap/appmap-js/compare/sw/fix_indexer_oom__version2?expand=1 .

It still produces an OOM.

— Reply to this email directly, view it on GitHub https://github.com/getappmap/appmap-js/issues/886#issuecomment-1339722303, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAVC6ZMBWNFO6HWUU5HGBLWL5ZJLANCNFSM6AAAAAASUSEMX4 . You are receiving this because you commented.Message ID: @.***>

symwell commented 1 year ago

Got an OOM on CI when I added a UI.prompt when installing with pip. Unsure how it's related with the OOM issue on the indexer.