Closed ignazio-bovo closed 5 months ago
2024-03-21 06:41:38:4138 StorageNodeApi [31merror[39m: [31mRequest timeout of 5000ms reached[39m
{
"0": {
"endpoint": "https://sieemmastorage.com/storage/api/v1"
},
"timeoutMs": 5000,
"trace_id": "e189941ce41181ed61025e9e07b8e34c",
"span_id": "0a86f2d94e50bc5e",
"trace_flags": "01"
}
2024-03-21 06:41:38:4138 StorageNodeApi [31merror[39m: [31mUnexpected error while requesting data object[39m
{
"0": {
"endpoint": "https://sieemmastorage.com/storage/api/v1"
},
"objectId": "2513914",
"err": {
"message": "Request timeout"
},
"trace_id": "e189941ce41181ed61025e9e07b8e34c",
"span_id": "0a86f2d94e50bc5e",
"trace_flags": "01"
}
2024-03-21 06:41:38:4138 NetworkingManager [31merror[39m: [31mData object download failed[39m
{
"err": {
"message": "Failed to download object 2513914 from any availablable storage provider",
"stack": "Error: Failed to download object 2513914 from any availablable storage provider\n at fail (/joystream/distributor-node/lib/services/networking/NetworkingService.js:224:24)\n at Queue.<anonymous> (/joystream/distributor-node/lib/services/networking/NetworkingService.js:265:21)\n at Queue.emit (node:events:517:28)\n at Queue.done (/joystream/node_modules/queue/index.js:194:8)\n at next (/joystream/node_modules/queue/index.js:118:16)\n at /joystream/node_modules/queue/index.js:150:14\n at processTicksAndRejections (node:internal/process/task_queues:95:5)\n at runNextTicks (node:internal/process/task_queues:64:3)\n at listOnTimeout (node:internal/timers:538:9)\n at process.processTimers (node:internal/timers:512:7)"
},
"trace_id": "e189941ce41181ed61025e9e07b8e34c",
"span_id": "0a86f2d94e50bc5e",
"trace_flags": "01"
}
2024-03-21 06:41:38:4138 PublicApi [31merror[39m: [31mmiddlewareError[39m
{
"err": {
"message": "Failed to download object 2513914 from any availablable storage provider",
"stack": "Error: Failed to download object 2513914 from any availablable storage provider\n at fail (/joystream/distributor-node/lib/services/networking/NetworkingService.js:223:25)\n at Queue.<anonymous> (/joystream/distributor-node/lib/services/networking/NetworkingService.js:265:21)\n at Queue.emit (node:events:517:28)\n at Queue.done (/joystream/node_modules/queue/index.js:194:8)\n at next (/joystream/node_modules/queue/index.js:118:16)\n at /joystream/node_modules/queue/index.js:150:14\n at processTicksAndRejections (node:internal/process/task_queues:95:5)\n at runNextTicks (node:internal/process/task_queues:64:3)\n at listOnTimeout (node:internal/timers:538:9)\n at process.processTimers (node:internal/timers:512:7)"
},
"req": {
"url": "/api/v1/assets/2513914",
"method": "GET",
"httpVersion": "1.1",
"originalUrl": "/api/v1/assets/2513914",
"query": {}
},
"trace_id": "e189941ce41181ed61025e9e07b8e34c",
"span_id": "0a86f2d94e50bc5e",
"trace_flags": "01"
}
2024-03-21 06:41:38:4138 PublicApi [35mhttp[39m: [35mHTTP GET /api/v1/assets/2513914[39m
{
"meta": {},
"trace_id": "e189941ce41181ed61025e9e07b8e34c",
"span_id": "0a86f2d94e50bc5e",
"trace_flags": "01"
}
<--- Last few GCs --->
[7:0x57ad870] 121787624 ms: Mark-sweep 4042.3 (4129.7) -> 4038.5 (4126.1) MB, 1341.4 / 0.0 ms (average mu = 0.242, current mu = 0.079) allocation failure; scavenge might not succeed
[7:0x57ad870] 121789771 ms: Mark-sweep 4056.1 (4127.9) -> 4054.2 (4157.8) MB, 2132.2 / 0.0 ms (average mu = 0.139, current mu = 0.007) allocation failure; scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
1: 0xb95b60 node::Abort() [node]
2: 0xa9a7f8 [node]
3: 0xd6f2f0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
4: 0xd6f697 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
5: 0xf4cba5 [node]
6: 0xf5f08d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
7: 0xf3978e v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
8: 0xf3ab57 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
9: 0xf1bd2a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
10: 0x12e114f v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x170deb9 [node]
/joystream/distributor-node/runner.sh: line 8: 7 Aborted (core dumped) node --require @joystream/opentelemetry ./bin/run $*
Loaded Application Instrumentation: "Distributor Node"
Starting tracing..
There are hundreds of logs at the exact same second, all about not being able to download an object. I think there may be an infinite loop/recursion somewhere there and it just runs out of memory.
This looks like very hard to reproduce. I also suspect that the download is failing because there's no sufficient HEAP space for the file to be stored in memory before it gets saved in the disk (or somewhat along these lines), so the error might be somewhere else as pointed out by Klaudiusz. I would leave this issue open, and if the error represents itself often (like at least once per week) then proceed with a proper investigation and I won't do nothing in the meantime as this looks very time consuming to reproduce and also I am not the one who wrote the Argus code. Let me know what do you think @kdembler @zeeshanakram3
TLDR
Ping at 7.40am CET on 2024-03-21 revealed that multiple nodes have crashed