Open dkincer87 opened 5 years ago
If possible please provide the logs of an terminating agent.
I have similar problem. I create cluster on k8s (3 worker nodes cluster) and one Agent is in restart loop.
Operator logs :
2019-11-08T15:26:48Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm 2019-11-08T15:26:49Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm 2019-11-08T15:26:50Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 2019-11-08T15:26:50Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false 2019-11-08T15:26:51Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm 2019-11-08T15:26:52Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm 2019-11-08T15:26:52Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 2019-11-08T15:26:52Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false 2019-11-08T15:26:53Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm 2019-11-08T15:26:54Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm 2019-11-08T15:26:55Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 2019-11-08T15:26:55Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false
logs from restarting Pod:
2019-11-08T16:02:17Z [1] INFO [e52b0] ArangoDB 3.5.0 [linux] 64bit, using jemalloc, build tags/v3.5.0-0-gc42dbe8547, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.0k 28 May 2019 2019-11-08T16:02:17Z [1] INFO [75ddc] detected operating system: Linux version 3.10.0-1062.4.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Oct 18 17:15:30 UTC 2019 2019-11-08T16:02:17Z [1] WARNING [118b0] {memory} maximum number of memory mappings per process is 262144, which seems too low. it is recommended to set it to at least 512000 2019-11-08T16:02:17Z [1] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=512000"' 2019-11-08T16:02:17Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/enabled is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise' 2019-11-08T16:02:17Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/defrag is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise' 2019-11-08T16:02:17Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"' 2019-11-08T16:02:17Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/defrag"' 2019-11-08T16:02:17Z [1] DEBUG [63a7a] host ASLR is in use for shared libraries, stack, mmap, VDSO, heap and memory managed through brk() 2019-11-08T16:02:17Z [1] DEBUG [713c0] {authentication} Not creating user manager 2019-11-08T16:02:17Z [1] DEBUG [71a76] {authentication} Setting jwt secret of size 64 2019-11-08T16:02:17Z [1] INFO [144fe] using storage engine rocksdb 2019-11-08T16:02:17Z [1] INFO [3bb7d] {cluster} Starting up with role AGENT 2019-11-08T16:02:17Z [1] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 1048576, soft limit is 1048576 2019-11-08T16:02:17Z [1] DEBUG [f6e04] {config} using default language 'en_US' 2019-11-08T16:02:17Z [1] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on 2019-11-08T16:02:17Z [1] DEBUG [f6e04] {config} using default language 'en_US' 2019-11-08T16:02:23Z [1] INFO [e6460] created base application directory '/var/lib/arangodb3-apps/_db' 2019-11-08T16:02:23Z [1] INFO [6ea38] using endpoint 'http+tcp://[::]:8529' for non-encrypted requests 2019-11-08T16:02:23Z [1] DEBUG [dc45a] bound to endpoint 'http+tcp://[::]:8529' 2019-11-08T16:02:23Z [1] INFO [cf3f4] ArangoDB (version 3.5.0 [linux]) is ready for business. Have fun! 2019-11-08T16:02:23Z [1] INFO [d7476] {agency} Restarting agent from persistence ... 2019-11-08T16:02:23Z [1] INFO [d96f6] {agency} Found active RAFTing agency lead by AGNT-axody2ec. Finishing startup sequence. 2019-11-08T16:02:23Z [1] INFO [fe299] {agency} Constituent::update: setting _leaderID to 'AGNT-axody2ec' in term 9 2019-11-08T16:02:23Z [1] INFO [79fd7] {agency} Activating agent. 2019-11-08T16:02:23Z [1] INFO [29175] {agency} Setting role to follower in term 9 2019-11-08T16:02:29Z [1] INFO [aefab] {agency} AGNT-abtwb2ag: candidating in term 9 2019-11-08T16:02:29Z [1] DEBUG [74339] accept failed: Operation canceled 2019-11-08T16:02:30Z [1] INFO [4bcb9] ArangoDB has been shut down
Limit memory for Agent pod is set to 2Gi
same issue.. anyone found a solution for it?
it started when i tried to set the dbservers.count
to 0
and now its stuck in a restart loop and no matter what i do i can't get it to stop
Im attempting to run a linux container in Docker For Windows with a kubernetes arango cluster. It starts up but gets stuck in an endless loop of the agents starting in error,terminating,then initializing. I also noticed that the load balancer I have is set to local host instead of the IP to hit the pod from an external source. Im not quite sure what im doing wrong. Running the same commands on a linux machine works fine. Any help would be appreciated. Let me know if you need more info. Logs & screen shots below.
Here is the Yaml to deploy the cluster:
Here is a Screen shot of the termination:
Here is them re-initializing:
The load balancer service:
and here is the describe command on the deployment: Name: arango-cluster Namespace: default Labels:
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"database.arangodb.com/v1alpha","kind":"ArangoDeployment","metadata":{"annotations":{},"name":"arango-cluster","namespace":"...
API Version: database.arangodb.com/v1alpha
Kind: ArangoDeployment
Metadata:
Cluster Name:
Creation Timestamp: 2019-03-29T02:06:59Z
Finalizers:
database.arangodb.com/remove-child-finalizers
Generation: 1
Resource Version: 71178
Self Link: /apis/database.arangodb.com/v1alpha/namespaces/default/arangodeployments/arango-cluster
UID: 560ccde7-51c7-11e9-b4f7-00155d001410
Spec:
Agents:
Count: 3
Resources:
Requests:
Storage: 8Gi
Auth:
Jwt Secret Name: arango-cluster-jwt
Chaos:
Interval: 60000000000
Kill - Pod - Probability: 50
Coordinators:
Count: 3
Resources:
Dbservers:
Count: 3
Resources:
Requests:
Storage: 8Gi
Environment: Development
External Access:
Type: LoadBalancer
Image: arangodb/arangodb:3.4.4
Image Pull Policy: IfNotPresent
License:
Mode: Cluster
Rocksdb:
Encryption:
Single:
Resources:
Requests:
Storage: 8Gi
Storage Engine: RocksDB
Sync:
Auth:
Client CA Secret Name: arango-cluster-sync-client-auth-ca
Jwt Secret Name: arango-cluster-sync-jwt
External Access:
Monitoring:
Token Secret Name: arango-cluster-sync-mt
Tls:
Ca Secret Name: arango-cluster-sync-ca
Ttl: 2610h
Syncmasters:
Resources:
Syncworkers:
Resources:
Tls:
Ca Secret Name: arango-cluster-ca
Ttl: 2610h
Status:
Accepted - Spec:
Agents:
Count: 3
Resources:
Requests:
Storage: 8Gi
Auth:
Jwt Secret Name: arango-cluster-jwt
Chaos:
Interval: 60000000000
Kill - Pod - Probability: 50
Coordinators:
Count: 3
Resources:
Dbservers:
Count: 3
Resources:
Requests:
Storage: 8Gi
Environment: Development
External Access:
Type: LoadBalancer
Image: arangodb/arangodb:3.4.4
Image Pull Policy: IfNotPresent
License:
Mode: Cluster
Rocksdb:
Encryption:
Single:
Resources:
Requests:
Storage: 8Gi
Storage Engine: RocksDB
Sync:
Auth:
Client CA Secret Name: arango-cluster-sync-client-auth-ca
Jwt Secret Name: arango-cluster-sync-jwt
External Access:
Monitoring:
Token Secret Name: arango-cluster-sync-mt
Tls:
Ca Secret Name: arango-cluster-sync-ca
Ttl: 2610h
Syncmasters:
Resources:
Syncworkers:
Resources:
Tls:
Ca Secret Name: arango-cluster-ca
Ttl: 2610h
Arangodb - Images:
Arangodb - Version: 3.4.4
Image: arangodb/arangodb:3.4.4
Image - Id: arangodb/arangodb@sha256:56abd87cc340a29f9d60e61a8941afb962b18a22c072fe6b77397fa24d531c04
Conditions:
Last Transition Time: 2019-03-29T02:07:07Z
Last Update Time: 2019-03-29T02:07:07Z
Status: False
Type: Ready
Current - Image:
Arangodb - Version: 3.4.4
Image: arangodb/arangodb:3.4.4
Image - Id: arangodb/arangodb@sha256:56abd87cc340a29f9d60e61a8941afb962b18a22c072fe6b77397fa24d531c04
Members:
Agents:
Conditions:
Last Transition Time: 2019-03-29T02:14:38Z
Last Update Time: 2019-03-29T02:14:38Z
Reason: Pod Not Ready
Status: False
Type: Ready
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Failed
Status: True
Type: Terminated
Last Transition Time: 2019-03-29T02:15:39Z
Last Update Time: 2019-03-29T02:15:39Z
Reason: Pod marked for deletion
Status: True
Type: Terminating
Created - At: 2019-03-29T02:07:01Z
Id: AGNT-bnq8tiee
Initialized: false
Persistent Volume Claim Name: arango-cluster-agent-bnq8tiee
Phase: Created
Pod Name: arango-cluster-agnt-bnq8tiee-128dfb
Recent - Terminations:
2019-03-29T02:08:04Z
2019-03-29T02:08:42Z
2019-03-29T02:09:20Z
2019-03-29T02:09:54Z
2019-03-29T02:10:29Z
2019-03-29T02:11:55Z
2019-03-29T02:13:07Z
2019-03-29T02:15:05Z
Conditions:
Last Transition Time: 2019-03-29T02:14:38Z
Last Update Time: 2019-03-29T02:14:38Z
Reason: Pod Not Ready
Status: False
Type: Ready
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Failed
Status: True
Type: Terminated
Last Transition Time: 2019-03-29T02:15:39Z
Last Update Time: 2019-03-29T02:15:39Z
Reason: Pod marked for deletion
Status: True
Type: Terminating
Created - At: 2019-03-29T02:07:01Z
Id: AGNT-co8m3gqw
Initialized: true
Persistent Volume Claim Name: arango-cluster-agent-co8m3gqw
Phase: Created
Pod Name: arango-cluster-agnt-co8m3gqw-128dfb
Recent - Terminations:
2019-03-29T02:07:59Z
2019-03-29T02:08:38Z
2019-03-29T02:09:16Z
2019-03-29T02:09:49Z
2019-03-29T02:10:29Z
2019-03-29T02:11:55Z
2019-03-29T02:13:07Z
2019-03-29T02:15:05Z
Conditions:
Last Transition Time: 2019-03-29T02:14:38Z
Last Update Time: 2019-03-29T02:14:38Z
Reason: Pod Not Ready
Status: False
Type: Ready
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Failed
Status: True
Type: Terminated
Last Transition Time: 2019-03-29T02:15:39Z
Last Update Time: 2019-03-29T02:15:39Z
Reason: Pod marked for deletion
Status: True
Type: Terminating
Created - At: 2019-03-29T02:07:01Z
Id: AGNT-z8htk4xo
Initialized: false
Persistent Volume Claim Name: arango-cluster-agent-z8htk4xo
Phase: Created
Pod Name: arango-cluster-agnt-z8htk4xo-128dfb
Recent - Terminations:
2019-03-29T02:08:05Z
2019-03-29T02:08:43Z
2019-03-29T02:09:21Z
2019-03-29T02:09:55Z
2019-03-29T02:10:29Z
2019-03-29T02:11:55Z
2019-03-29T02:13:07Z
2019-03-29T02:15:05Z
Coordinators:
Conditions:
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Not Ready
Status: False
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: CRDN-fgfmyrni
Initialized: false
Phase: Created
Pod Name: arango-cluster-crdn-fgfmyrni-128dfb
Recent - Terminations:
2019-03-29T02:14:38Z
Conditions:
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Not Ready
Status: False
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: CRDN-sxgd2w6y
Initialized: false
Phase: Created
Pod Name: arango-cluster-crdn-sxgd2w6y-128dfb
Recent - Terminations:
2019-03-29T02:14:38Z
Conditions:
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Not Ready
Status: False
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: CRDN-to1bm0c3
Initialized: false
Phase: Created
Pod Name: arango-cluster-crdn-to1bm0c3-128dfb
Recent - Terminations:
2019-03-29T02:14:38Z
Dbservers:
Conditions:
Last Transition Time: 2019-03-29T02:07:59Z
Last Update Time: 2019-03-29T02:07:59Z
Reason: Pod Ready
Status: True
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: PRMR-32xzf09n
Initialized: true
Persistent Volume Claim Name: arango-cluster-dbserver-32xzf09n
Phase: Created
Pod Name: arango-cluster-prmr-32xzf09n-128dfb
Recent - Terminations:
Conditions:
Last Transition Time: 2019-03-29T02:07:50Z
Last Update Time: 2019-03-29T02:07:50Z
Reason: Pod Ready
Status: True
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: PRMR-cdvjutsj
Initialized: true
Persistent Volume Claim Name: arango-cluster-dbserver-cdvjutsj
Phase: Created
Pod Name: arango-cluster-prmr-cdvjutsj-128dfb
Recent - Terminations:
Conditions:
Last Transition Time: 2019-03-29T02:08:11Z
Last Update Time: 2019-03-29T02:08:11Z
Reason: Pod Ready
Status: True
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: PRMR-gdxuutwg
Initialized: true
Persistent Volume Claim Name: arango-cluster-dbserver-gdxuutwg
Phase: Created
Pod Name: arango-cluster-prmr-gdxuutwg-128dfb
Recent - Terminations:
Phase: Running
Plan:
Creation Time: 2019-03-29T02:12:45Z
Group: 4
Id: cJIhbLxsuV4pyc5n
Member ID: CRDN-fgfmyrni
Type: RemoveMember
Creation Time: 2019-03-29T02:12:45Z
Group: 4
Id: U1PGroR95ybUN1Km
Type: AddMember
Secret - Hashes:
Auth - Jwt: eff9b00914cc1a4511de887d056a5c4ef1b324f6b746456ca2443db98bdaeea1
Tls - Ca: d69ded4284c5f832388b3633c3db1453d91514300a07313ddf91f32b31653b45
Service Name: arango-cluster
Events:
Type Reason Age From Message
Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-fgfmyrni added to deployment Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-co8m3gqw added to deployment Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-z8htk4xo added to deployment Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-32xzf09n added to deployment Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-cdvjutsj added to deployment Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-gdxuutwg added to deployment Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-sxgd2w6y added to deployment Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-to1bm0c3 added to deployment Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-bnq8tiee added to deployment Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-32xzf09n-128dfb of member dbserver is created Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-cdvjutsj-128dfb of member dbserver is created Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-gdxuutwg-128dfb of member dbserver is created Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-fgfmyrni-128dfb of member coordinator is created Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-sxgd2w6y-128dfb of member coordinator is created Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-to1bm0c3-128dfb of member coordinator is created Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-co8m3gqw-128dfb of member agent is created Normal Pod Of Agent Gone 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-z8htk4xo-128dfb of member agent is gone Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-bnq8tiee-128dfb of member agent is created Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-z8htk4xo-128dfb of member agent is created Normal Pod Of Agent Gone 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-co8m3gqw-128dfb of member agent is gone Normal Pod Of Agent Gone 36h (x6 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-bnq8tiee-128dfb of member agent is gone