dotCMS / core

Headless/Hybrid Content Management System for Enterprises
http://dotcms.com
Other
865 stars 467 forks source link

Troubleshoot CubeJS error in production analytics infrastructure #30448

Closed victoralfaro-dotcms closed 3 weeks ago

victoralfaro-dotcms commented 1 month ago

Parent Issue

Investigate and resolve CubeJS error in production analytics infrastructure

User Story

As a quasi dev-ops engineer, I want to troubleshoot and resolve the CubeJS failure in the production analytics infrastructure that generates an error message when trying to remove a key, so that the analytics can function correctly and without interruptions.

Acceptance Criteria

dotCMS Version

master

Proposed Objective

Core Features

Proposed Priority

Priority 2 - Important

External Links

Assumptions & Initiation Needs

Quality Assurance Notes & Workarounds

Error message:

{"message":"Error removing key","cacheKey":["SELECT\n      `request`.what_am_i `request__what_am_i`, count(*) `request__count`\n    FROM\n      (SELECT request_id,\n               MAX(sessionid) as sessionid,\n               (MAX(sessionnew) == 1)::bool as isSessionNew,\n               MIN(utc_time) as createdAt,\n               MAX(source_ip) as source_ip,\n               MAX(language) as language,\n               MAX(useragent) as user_agent,\n               MAX(persona) as persona,\n               MAX(rendermode) as rendermode,\n               MAX(referer) as referer,\n               MAX(host) as host,\n               MAX(CASE WHEN event_type = 'PAGE_REQUEST' THEN object_id ELSE NULL END) as page_id,\n               MAX(CASE WHEN event_type = 'PAGE_REQUEST' THEN object_title ELSE NULL END) as page_title,\n               MAX(CASE WHEN event_type = 'FILE_REQUEST' THEN object_id ELSE NULL END) as file_id,\n               MAX(CASE WHEN event_type = 'FILE_REQUEST' THEN object_title ELSE NULL END) as file_title,\n               MAX(CASE WHEN event_type = 'VANITY_REQUEST' THEN object_id ELSE NULL END) as vanity_id,\n               MAX(CASE WHEN event_type = 'VANITY_REQUEST' THEN object_forward_to ELSE NULL END) as vanity_forward_to,\n               MAX(CASE WHEN event_type = 'VANITY_REQUEST' THEN object_response ELSE NULL END) as vanity_response,\n               (SUM(CASE WHEN event_type = 'VANITY_REQUEST' THEN 1 ELSE 0 END) > 0)::bool as was_vanity_url_hit,\n               MAX(CASE WHEN event_type = 'VANITY_REQUEST' THEN comefromvanityurl ELSE NULL END) as come_from_vanity_url,\n               (SUM(CASE WHEN event_type = 'URL_MAP' THEN 1 ELSE 0 END) > 0)::bool as url_map_match,\n               MAX(CASE WHEN event_type = 'URL_MAP' THEN object_id ELSE NULL END) as url_map_content_detail_id,\n               MAX(CASE WHEN event_type = 'URL_MAP' THEN object_title ELSE NULL END) as url_map_content_detail_title,\n               MAX(CASE WHEN event_type = 'URL_MAP' THEN object_content_type_id ELSE NULL END) as url_map_content_type_id,\n               MAX(CASE WHEN event_type = 'URL_MAP' THEN object_content_type_name ELSE NULL END) as url_map_content_type_name,\n               MAX(CASE WHEN event_type = 'URL_MAP' THEN object_content_type_var_name ELSE NULL END) as url_map_content_type_var_name,\n               MAX(object_detail_page_url) as url_map_detail_page_url,\n               MAX(url) AS url,\n               CASE\n                 WHEN MAX(CASE WHEN event_type = 'FILE_REQUEST' THEN 1 ELSE 0 END) = 1 THEN 'FILE'\n                 WHEN MAX(CASE WHEN event_type = 'PAGE_REQUEST' THEN 1 ELSE 0 END) = 1 THEN 'PAGE'\n                 WHEN MAX(CASE WHEN event_type = 'VANITY_REQUEST' AND object_response != '200' THEN 1 ELSE 0 END) = 1 THEN 'VANITY_REDIRECT'\n                 ELSE 'OTHER'\n               END  AS what_am_i\n        FROM events\n        GROUP BY request_id) AS `request`  GROUP BY `request__what_am_i` LIMIT 10000",[],[]],"spanId":"0af05d5b9cbaadb0a9541e89813c5acd","error":"Error: Internal: Unable to receive result for write task: channel closed\n    at WebSocket.<anonymous> (/cube/node_modules/@cubejs-backend/cubestore-driver/src/WebSocketConnection.ts:121:30)\n    at WebSocket.emit (node:events:519:28)\n    at Receiver.receiverOnMessage (/cube/node_modules/ws/lib/websocket.js:1008:20)\n    at Receiver.emit (node:events:519:28)\n    at Receiver.dataMessage (/cube/node_modules/ws/lib/receiver.js:502:14)\n    at Receiver.getData (/cube/node_modules/ws/lib/receiver.js:435:17)\n    at Receiver.startLoop (/cube/node_modules/ws/lib/receiver.js:143:22)\n    at Receiver._write (/cube/node_modules/ws/lib/receiver.js:78:10)\n    at writeOrBuffer (node:internal/streams/writable:570:12)\n    at _write (node:internal/streams/writable:499:10)\n    at Receiver.Writable.write (node:internal/streams/writable:508:10)\n    at Socket.socketOnData (/cube/node_modules/ws/lib/websocket.js:1102:35)\n    at Socket.emit (node:events:519:28)\n    at addChunk (node:internal/streams/readable:559:12)\n    at readableAddChunkPushByteMode (node:internal/streams/readable:510:3)\n    at Socket.Readable.push (node:internal/streams/readable:390:5)\n    at TCP.onStreamRead (node:internal/stream_base_commons:191:23)","requestId":"4708c2c3-6268-40f6-9b85-39a8f0bc6215-span-1"}
victoralfaro-dotcms commented 1 month ago

After researching we've realized that a CubeJS pod (cubestore-router-0) has a volume that has run out of space:

2024-10-23T14:42:24.249Z INFO  [cubestore::metastore::rocks_fs] <pid:1> Using existing metastore in /.cubestore/data/metastore
thread 'main' panicked at /build/cubestore/cubestore/src/config/mod.rs:1985:34:
called `Result::unwrap()` on an `Err` value: CubeError { message: "DB::open error for metastore: IO error: No space left on device: While renaming a file to /.cubestore/data/metastore/LOG.old.1729694544249766: /.cubestore/data/metastore/LOG: No space left on device", backtrace: "", cause: Internal }

Free space

Filesystem      Size  Used Avail Use% Mounted on
overlay          50G   25G   26G  49% /
tmpfs            64M     0   64M   0% /dev
/dev/root       904M  609M  233M  73% /usr/local/sbin/modprobe
/dev/nvme2n1    9.7G  9.7G     0 100% /.cubestore/data
/dev/nvme1n1p1   50G   25G   26G  49% /etc/hosts
shm              64M     0   64M   0% /dev/shm
tmpfs            30G   12K   30G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs            30G  4.0K   30G   1% /run/secrets/eks.amazonaws.com/serviceaccount
tmpfs            16G     0   16G   0% /proc/acpi
tmpfs            16G     0   16G   0% /proc/scsi
tmpfs            16G     0   16G   0% /sys/firmware
victoralfaro-dotcms commented 1 month ago

Issue https://github.com/dotCMS/core/issues/30433 has been created to track work on volume size increase for cubestore-router pod.