arangodb / arangodb

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
https://www.arangodb.com
Other
13.57k stars 836 forks source link

ArangoDB running into some I/O errors with kubernetes deployment and persistent volume #19455

Open sfialok31 opened 1 year ago

sfialok31 commented 1 year ago

My Environment

Component, Query & Data

Affected feature: Deployment

Size of your Dataset on disk: 200MB

Problem:

Hi,

I am deploying arangodb as a deployment with 1 replica on kubernetes with Google Kubernetes Engine. For data persistence, I am using a ReadWriteOnce volume and mounting it at /var/lib/arangodb3 location. The volume is provisioned using GKE's standard-rwo storage class.

Occasionaly I am running into some I/O errors and I am finding it hard to figure out its cause and solution.

The logs look like:

2023-07-20T21:07:05Z [8] INFO [e52b0] {general} ArangoDB 3.9.6 [linux] 64bit, using jemalloc, build tags/v3.9.6-0-g581d711313c, VPack 0.1.35, RocksDB 6.27.0, ICU 64.2, V8 7.9.317, OpenSSL 1.1.1s 1 Nov 2022
2023-07-20T21:07:05Z [8] INFO [75ddc] {general} detected operating system: Linux version 5.15.107+ (builder@localhost) (Chromium OS 14.0_pre445002_p20220217-r3 clang version 14.0.0 (/var/tmp/portage/sys-devel/llvm-14.0_pre445002_p20220217-r3/work/llvm-14.0_pre445002_p20220217/clang 18308e171b5b1dd99627a4d88c7d6c5ff21b8c96), LLD 14.0.0) #1 SMP Thu Jun 15 09:51:46 UTC 2023
2023-07-20T21:07:05Z [8] INFO [25362] {memory} Available physical memory: 6442450944 bytes (overriden by environment variable), available cores: 4 (overriden by environment variable)
2023-07-20T21:07:05Z [8] WARNING [118b0] {memory} maximum number of memory mappings per process is 65530, which seems too low. it is recommended to set it to at least 256000
2023-07-20T21:07:05Z [8] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=256000"'
2023-07-20T21:07:05Z [8] INFO [3bb7d] {cluster} Starting up with role SINGLE
2023-07-20T21:07:05Z [8] INFO [f6e0e] {aql} memory limit per AQL query automatically set to 3865470567 bytes. to modify this value, please adjust the startup option `--query.memory-limit`
2023-07-20T21:07:05Z [8] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 1048576, soft limit is 1048576
2023-07-20T21:07:05Z [8] WARNING [ad4b2] {general} found existing lockfile '/var/lib/arangodb3/LOCK' of previous process with pid 8, and that process seems to be still running
2023-07-20T21:07:06Z [8] INFO [e6460] {general} created base application directory '/var/lib/arangodb3-apps/_db'
2023-07-20T21:07:06Z [8] INFO [ecdbb] {engines} calculater/dependency: no index estimate found for index of type 'edge', id '1', recalculating...
2023-07-20T21:07:06Z [8] INFO [ecdbb] {engines} calculater/dependency: no index estimate found for index of type 'edge', id '2', recalculating...
2023-07-20T21:07:06Z [8] INFO [fe333] {engines} RocksDB recovery starting, scanning WAL starting from sequence number 283, latest sequence number: 348, files in archive: 0
2023-07-20T21:07:06Z [8] INFO [a4ec8] {engines} RocksDB recovery finished, WAL entries scanned: 89, recovery start sequence number: 283, latest WAL sequence number: 348, max tick value found in WAL: 273, last HLC value found in WAL: 1771975149341376513
2023-07-20T21:07:06Z [8] INFO [c1b63] {arangosearch} ArangoSearch maintenance: [1..1] commit thread(s), [1..1] consolidation thread(s)
2023-07-20T21:07:06Z [8] INFO [6ea38] {general} using endpoint 'http+tcp://0.0.0.0:8529' for non-encrypted requests
2023-07-20T21:07:07Z [8] INFO [cf3f4] {general} ArangoDB (version 3.9.6 [linux]) is ready for business. Have fun!
2023-07-20T21:07:14Z [8] ERROR [fae2c] {rocksdb} RocksDB encountered a background error during a write callback operation: IO error: While fdatasync: /var/lib/arangodb3/engine-rocksdb/journals/000120.log: I/O error; The database will be put in read-only mode, and subsequent write errors are likely. It is advised to shut down this instance, resolve the error offline and then restart it.
2023-07-20T21:07:14Z [8] WARNING [a3d0c] {engines} background settings sync failed: IO error: While fdatasync: /var/lib/arangodb3/engine-rocksdb/journals/000120.log: I/O error
2023-07-20T21:07:14Z [8] ERROR [be9ea] {rocksdb} rocksdb: [db/db_impl/db_impl.cc:1385] WAL Sync error IO error: While fdatasync: /var/lib/arangodb3/engine-rocksdb/journals/000120.log: I/O error
2023-07-20T21:07:14Z [8] WARNING [078ee] {engines} could not get WAL files: IO error: While readdir: /var/lib/arangodb3/engine-rocksdb/journals: I/O error
2023-07-20T21:07:14Z [8] ERROR [5e275] {engines} could not sync RocksDB WAL: IO error: While fdatasync: /var/lib/arangodb3/engine-rocksdb/journals/000120.log: I/O error
2023-07-20T21:07:14Z [8] ERROR [be9ea] {rocksdb} rocksdb: [db/db_impl/db_impl.cc:1385] WAL Sync error IO error: While fdatasync: /var/lib/arangodb3/engine-rocksdb/journals/000120.log: I/O error

I added a liveness probe to restart the container if there is an issue with writes in the volume, but even after the container restart, the problem is not fully fixed. logs:

2023-07-20T21:09:40Z [8] INFO [e52b0] {general} ArangoDB 3.9.6 [linux] 64bit, using jemalloc, build tags/v3.9.6-0-g581d711313c, VPack 0.1.35, RocksDB 6.27.0, ICU 64.2, V8 7.9.317, OpenSSL 1.1.1s 1 Nov 2022
2023-07-20T21:09:40Z [8] INFO [75ddc] {general} detected operating system: Linux version 5.15.107+ (builder@localhost) (Chromium OS 14.0_pre445002_p20220217-r3 clang version 14.0.0 (/var/tmp/portage/sys-devel/llvm-14.0_pre445002_p20220217-r3/work/llvm-14.0_pre445002_p20220217/clang 18308e171b5b1dd99627a4d88c7d6c5ff21b8c96), LLD 14.0.0) #1 SMP Thu Jun 15 09:51:46 UTC 2023
2023-07-20T21:09:40Z [8] INFO [25362] {memory} Available physical memory: 6442450944 bytes (overriden by environment variable), available cores: 4 (overriden by environment variable)
2023-07-20T21:09:40Z [8] WARNING [118b0] {memory} maximum number of memory mappings per process is 65530, which seems too low. it is recommended to set it to at least 256000
2023-07-20T21:09:40Z [8] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=256000"'
2023-07-20T21:09:40Z [8] FATAL [23ec1] {startup} unable to read content of 'ENGINE' file '/var/lib/arangodb3/ENGINE': read failed for file '/var/lib/arangodb3/ENGINE': I/O error. please make sure the file/directory is readable for the arangod process and user

There should not be a permission issue since arangodb is being run with ROOT user.

Could someone please help me in fixing this or figuring out the cause? Thanks

eadmaster commented 9 months ago

similar error while running insert/updates with python:

arango.exceptions.DocumentInsertError: [HTTP 500][ERR 1305] IO error: While fdatasync: /data/engine-rocksdb/journals/000844.log: Socket not connected

Reading works file though.

Maybe this has something to do with the invalid certificate?

[2024-02-05 16:56:32] WARNING [py.warnings _showwarnmsg:109] /usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py:1061: InsecureRequestWarning: Unverified HTTPS request is being made to host 'gmcdev-cluster-ea.default'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings