arangodb / kube-arangodb

ArangoDB Kubernetes Operator - Start ArangoDB on Kubernetes in 5min
https://arangodb.github.io/kube-arangodb/
Apache License 2.0
228 stars 71 forks source link

Agents Stuck in loop on Docker For Windows #366

Open dkincer87 opened 5 years ago

dkincer87 commented 5 years ago

Im attempting to run a linux container in Docker For Windows with a kubernetes arango cluster. It starts up but gets stuck in an endless loop of the agents starting in error,terminating,then initializing. I also noticed that the load balancer I have is set to local host instead of the IP to hit the pod from an external source. Im not quite sure what im doing wrong. Running the same commands on a linux machine works fine. Any help would be appreciated. Let me know if you need more info. Logs & screen shots below.

Here is the Yaml to deploy the cluster: clusterYaml

Here is a Screen shot of the termination: AgentTerminating

Here is them re-initializing: AgentInit

The load balancer service: LoadBalancer

and here is the describe command on the deployment: Name: arango-cluster Namespace: default Labels: Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"database.arangodb.com/v1alpha","kind":"ArangoDeployment","metadata":{"annotations":{},"name":"arango-cluster","namespace":"... API Version: database.arangodb.com/v1alpha Kind: ArangoDeployment Metadata: Cluster Name: Creation Timestamp: 2019-03-29T02:06:59Z Finalizers: database.arangodb.com/remove-child-finalizers Generation: 1 Resource Version: 71178 Self Link: /apis/database.arangodb.com/v1alpha/namespaces/default/arangodeployments/arango-cluster UID: 560ccde7-51c7-11e9-b4f7-00155d001410 Spec: Agents: Count: 3 Resources: Requests: Storage: 8Gi Auth: Jwt Secret Name: arango-cluster-jwt Chaos: Interval: 60000000000 Kill - Pod - Probability: 50 Coordinators: Count: 3 Resources: Dbservers: Count: 3 Resources: Requests: Storage: 8Gi Environment: Development External Access: Type: LoadBalancer Image: arangodb/arangodb:3.4.4 Image Pull Policy: IfNotPresent License: Mode: Cluster Rocksdb: Encryption: Single: Resources: Requests: Storage: 8Gi Storage Engine: RocksDB Sync: Auth: Client CA Secret Name: arango-cluster-sync-client-auth-ca Jwt Secret Name: arango-cluster-sync-jwt External Access: Monitoring: Token Secret Name: arango-cluster-sync-mt Tls: Ca Secret Name: arango-cluster-sync-ca Ttl: 2610h Syncmasters: Resources: Syncworkers: Resources: Tls: Ca Secret Name: arango-cluster-ca Ttl: 2610h Status: Accepted - Spec: Agents: Count: 3 Resources: Requests: Storage: 8Gi Auth: Jwt Secret Name: arango-cluster-jwt Chaos: Interval: 60000000000 Kill - Pod - Probability: 50 Coordinators: Count: 3 Resources: Dbservers: Count: 3 Resources: Requests: Storage: 8Gi Environment: Development External Access: Type: LoadBalancer Image: arangodb/arangodb:3.4.4 Image Pull Policy: IfNotPresent License: Mode: Cluster Rocksdb: Encryption: Single: Resources: Requests: Storage: 8Gi Storage Engine: RocksDB Sync: Auth: Client CA Secret Name: arango-cluster-sync-client-auth-ca Jwt Secret Name: arango-cluster-sync-jwt External Access: Monitoring: Token Secret Name: arango-cluster-sync-mt Tls: Ca Secret Name: arango-cluster-sync-ca Ttl: 2610h Syncmasters: Resources: Syncworkers: Resources: Tls: Ca Secret Name: arango-cluster-ca Ttl: 2610h Arangodb - Images: Arangodb - Version: 3.4.4 Image: arangodb/arangodb:3.4.4 Image - Id: arangodb/arangodb@sha256:56abd87cc340a29f9d60e61a8941afb962b18a22c072fe6b77397fa24d531c04 Conditions: Last Transition Time: 2019-03-29T02:07:07Z Last Update Time: 2019-03-29T02:07:07Z Status: False Type: Ready Current - Image: Arangodb - Version: 3.4.4 Image: arangodb/arangodb:3.4.4 Image - Id: arangodb/arangodb@sha256:56abd87cc340a29f9d60e61a8941afb962b18a22c072fe6b77397fa24d531c04 Members: Agents: Conditions: Last Transition Time: 2019-03-29T02:14:38Z Last Update Time: 2019-03-29T02:14:38Z Reason: Pod Not Ready Status: False Type: Ready Last Transition Time: 2019-03-29T02:15:05Z Last Update Time: 2019-03-29T02:15:05Z Reason: Pod Failed Status: True Type: Terminated Last Transition Time: 2019-03-29T02:15:39Z Last Update Time: 2019-03-29T02:15:39Z Reason: Pod marked for deletion Status: True Type: Terminating Created - At: 2019-03-29T02:07:01Z Id: AGNT-bnq8tiee Initialized: false Persistent Volume Claim Name: arango-cluster-agent-bnq8tiee Phase: Created Pod Name: arango-cluster-agnt-bnq8tiee-128dfb Recent - Terminations: 2019-03-29T02:08:04Z 2019-03-29T02:08:42Z 2019-03-29T02:09:20Z 2019-03-29T02:09:54Z 2019-03-29T02:10:29Z 2019-03-29T02:11:55Z 2019-03-29T02:13:07Z 2019-03-29T02:15:05Z Conditions: Last Transition Time: 2019-03-29T02:14:38Z Last Update Time: 2019-03-29T02:14:38Z Reason: Pod Not Ready Status: False Type: Ready Last Transition Time: 2019-03-29T02:15:05Z Last Update Time: 2019-03-29T02:15:05Z Reason: Pod Failed Status: True Type: Terminated Last Transition Time: 2019-03-29T02:15:39Z Last Update Time: 2019-03-29T02:15:39Z Reason: Pod marked for deletion Status: True Type: Terminating Created - At: 2019-03-29T02:07:01Z Id: AGNT-co8m3gqw Initialized: true Persistent Volume Claim Name: arango-cluster-agent-co8m3gqw Phase: Created Pod Name: arango-cluster-agnt-co8m3gqw-128dfb Recent - Terminations: 2019-03-29T02:07:59Z 2019-03-29T02:08:38Z 2019-03-29T02:09:16Z 2019-03-29T02:09:49Z 2019-03-29T02:10:29Z 2019-03-29T02:11:55Z 2019-03-29T02:13:07Z 2019-03-29T02:15:05Z Conditions: Last Transition Time: 2019-03-29T02:14:38Z Last Update Time: 2019-03-29T02:14:38Z Reason: Pod Not Ready Status: False Type: Ready Last Transition Time: 2019-03-29T02:15:05Z Last Update Time: 2019-03-29T02:15:05Z Reason: Pod Failed Status: True Type: Terminated Last Transition Time: 2019-03-29T02:15:39Z Last Update Time: 2019-03-29T02:15:39Z Reason: Pod marked for deletion Status: True Type: Terminating Created - At: 2019-03-29T02:07:01Z Id: AGNT-z8htk4xo Initialized: false Persistent Volume Claim Name: arango-cluster-agent-z8htk4xo Phase: Created Pod Name: arango-cluster-agnt-z8htk4xo-128dfb Recent - Terminations: 2019-03-29T02:08:05Z 2019-03-29T02:08:43Z 2019-03-29T02:09:21Z 2019-03-29T02:09:55Z 2019-03-29T02:10:29Z 2019-03-29T02:11:55Z 2019-03-29T02:13:07Z 2019-03-29T02:15:05Z Coordinators: Conditions: Last Transition Time: 2019-03-29T02:15:05Z Last Update Time: 2019-03-29T02:15:05Z Reason: Pod Not Ready Status: False Type: Ready Created - At: 2019-03-29T02:07:01Z Id: CRDN-fgfmyrni Initialized: false Phase: Created Pod Name: arango-cluster-crdn-fgfmyrni-128dfb Recent - Terminations: 2019-03-29T02:14:38Z Conditions: Last Transition Time: 2019-03-29T02:15:05Z Last Update Time: 2019-03-29T02:15:05Z Reason: Pod Not Ready Status: False Type: Ready Created - At: 2019-03-29T02:07:01Z Id: CRDN-sxgd2w6y Initialized: false Phase: Created Pod Name: arango-cluster-crdn-sxgd2w6y-128dfb Recent - Terminations: 2019-03-29T02:14:38Z Conditions: Last Transition Time: 2019-03-29T02:15:05Z Last Update Time: 2019-03-29T02:15:05Z Reason: Pod Not Ready Status: False Type: Ready Created - At: 2019-03-29T02:07:01Z Id: CRDN-to1bm0c3 Initialized: false Phase: Created Pod Name: arango-cluster-crdn-to1bm0c3-128dfb Recent - Terminations: 2019-03-29T02:14:38Z Dbservers: Conditions: Last Transition Time: 2019-03-29T02:07:59Z Last Update Time: 2019-03-29T02:07:59Z Reason: Pod Ready Status: True Type: Ready Created - At: 2019-03-29T02:07:01Z Id: PRMR-32xzf09n Initialized: true Persistent Volume Claim Name: arango-cluster-dbserver-32xzf09n Phase: Created Pod Name: arango-cluster-prmr-32xzf09n-128dfb Recent - Terminations: Conditions: Last Transition Time: 2019-03-29T02:07:50Z Last Update Time: 2019-03-29T02:07:50Z Reason: Pod Ready Status: True Type: Ready Created - At: 2019-03-29T02:07:01Z Id: PRMR-cdvjutsj Initialized: true Persistent Volume Claim Name: arango-cluster-dbserver-cdvjutsj Phase: Created Pod Name: arango-cluster-prmr-cdvjutsj-128dfb Recent - Terminations: Conditions: Last Transition Time: 2019-03-29T02:08:11Z Last Update Time: 2019-03-29T02:08:11Z Reason: Pod Ready Status: True Type: Ready Created - At: 2019-03-29T02:07:01Z Id: PRMR-gdxuutwg Initialized: true Persistent Volume Claim Name: arango-cluster-dbserver-gdxuutwg Phase: Created Pod Name: arango-cluster-prmr-gdxuutwg-128dfb Recent - Terminations: Phase: Running Plan: Creation Time: 2019-03-29T02:12:45Z Group: 4 Id: cJIhbLxsuV4pyc5n Member ID: CRDN-fgfmyrni Type: RemoveMember Creation Time: 2019-03-29T02:12:45Z Group: 4 Id: U1PGroR95ybUN1Km Type: AddMember Secret - Hashes: Auth - Jwt: eff9b00914cc1a4511de887d056a5c4ef1b324f6b746456ca2443db98bdaeea1 Tls - Ca: d69ded4284c5f832388b3633c3db1453d91514300a07313ddf91f32b31653b45 Service Name: arango-cluster Events: Type Reason Age From Message


Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-fgfmyrni added to deployment Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-co8m3gqw added to deployment Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-z8htk4xo added to deployment Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-32xzf09n added to deployment Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-cdvjutsj added to deployment Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-gdxuutwg added to deployment Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-sxgd2w6y added to deployment Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-to1bm0c3 added to deployment Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-bnq8tiee added to deployment Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-32xzf09n-128dfb of member dbserver is created Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-cdvjutsj-128dfb of member dbserver is created Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-gdxuutwg-128dfb of member dbserver is created Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-fgfmyrni-128dfb of member coordinator is created Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-sxgd2w6y-128dfb of member coordinator is created Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-to1bm0c3-128dfb of member coordinator is created Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-co8m3gqw-128dfb of member agent is created Normal Pod Of Agent Gone 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-z8htk4xo-128dfb of member agent is gone Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-bnq8tiee-128dfb of member agent is created Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-z8htk4xo-128dfb of member agent is created Normal Pod Of Agent Gone 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-co8m3gqw-128dfb of member agent is gone Normal Pod Of Agent Gone 36h (x6 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-bnq8tiee-128dfb of member agent is gone

maierlars commented 5 years ago

If possible please provide the logs of an terminating agent.

xelik commented 5 years ago

I have similar problem. I create cluster on k8s (3 worker nodes cluster) and one Agent is in restart loop.

Screenshot_20191108_160504

Operator logs :

2019-11-08T15:26:48Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm 2019-11-08T15:26:49Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm 2019-11-08T15:26:50Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 2019-11-08T15:26:50Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false 2019-11-08T15:26:51Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm 2019-11-08T15:26:52Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm 2019-11-08T15:26:52Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 2019-11-08T15:26:52Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false 2019-11-08T15:26:53Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm 2019-11-08T15:26:54Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm 2019-11-08T15:26:55Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 2019-11-08T15:26:55Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false

logs from restarting Pod:

2019-11-08T16:02:17Z [1] INFO [e52b0] ArangoDB 3.5.0 [linux] 64bit, using jemalloc, build tags/v3.5.0-0-gc42dbe8547, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.0k 28 May 2019 2019-11-08T16:02:17Z [1] INFO [75ddc] detected operating system: Linux version 3.10.0-1062.4.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Oct 18 17:15:30 UTC 2019 2019-11-08T16:02:17Z [1] WARNING [118b0] {memory} maximum number of memory mappings per process is 262144, which seems too low. it is recommended to set it to at least 512000 2019-11-08T16:02:17Z [1] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=512000"' 2019-11-08T16:02:17Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/enabled is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise' 2019-11-08T16:02:17Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/defrag is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise' 2019-11-08T16:02:17Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"' 2019-11-08T16:02:17Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/defrag"' 2019-11-08T16:02:17Z [1] DEBUG [63a7a] host ASLR is in use for shared libraries, stack, mmap, VDSO, heap and memory managed through brk() 2019-11-08T16:02:17Z [1] DEBUG [713c0] {authentication} Not creating user manager 2019-11-08T16:02:17Z [1] DEBUG [71a76] {authentication} Setting jwt secret of size 64 2019-11-08T16:02:17Z [1] INFO [144fe] using storage engine rocksdb 2019-11-08T16:02:17Z [1] INFO [3bb7d] {cluster} Starting up with role AGENT 2019-11-08T16:02:17Z [1] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 1048576, soft limit is 1048576 2019-11-08T16:02:17Z [1] DEBUG [f6e04] {config} using default language 'en_US' 2019-11-08T16:02:17Z [1] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on 2019-11-08T16:02:17Z [1] DEBUG [f6e04] {config} using default language 'en_US' 2019-11-08T16:02:23Z [1] INFO [e6460] created base application directory '/var/lib/arangodb3-apps/_db' 2019-11-08T16:02:23Z [1] INFO [6ea38] using endpoint 'http+tcp://[::]:8529' for non-encrypted requests 2019-11-08T16:02:23Z [1] DEBUG [dc45a] bound to endpoint 'http+tcp://[::]:8529' 2019-11-08T16:02:23Z [1] INFO [cf3f4] ArangoDB (version 3.5.0 [linux]) is ready for business. Have fun! 2019-11-08T16:02:23Z [1] INFO [d7476] {agency} Restarting agent from persistence ... 2019-11-08T16:02:23Z [1] INFO [d96f6] {agency} Found active RAFTing agency lead by AGNT-axody2ec. Finishing startup sequence. 2019-11-08T16:02:23Z [1] INFO [fe299] {agency} Constituent::update: setting _leaderID to 'AGNT-axody2ec' in term 9 2019-11-08T16:02:23Z [1] INFO [79fd7] {agency} Activating agent. 2019-11-08T16:02:23Z [1] INFO [29175] {agency} Setting role to follower in term 9 2019-11-08T16:02:29Z [1] INFO [aefab] {agency} AGNT-abtwb2ag: candidating in term 9 2019-11-08T16:02:29Z [1] DEBUG [74339] accept failed: Operation canceled 2019-11-08T16:02:30Z [1] INFO [4bcb9] ArangoDB has been shut down

Limit memory for Agent pod is set to 2Gi

iMoses commented 4 years ago

same issue.. anyone found a solution for it? it started when i tried to set the dbservers.count to 0 and now its stuck in a restart loop and no matter what i do i can't get it to stop