cloudfoundry-incubator / quarks-operator

BOSH releases deployed on Kubernetes
https://www.cloudfoundry.org/project-quarks/
Apache License 2.0
49 stars 35 forks source link

Single operator for multiple namespace support feature not working as expected #1227

Closed divyaaswath closed 3 years ago

divyaaswath commented 3 years ago

Describe the bug Followed the documentation to set up multiple namespaces for kubecf v2.6.1 using a single operator. Results are not as expected. All environment except the last one fails with scheduler-0 pod getting into CrashLoopBackOff status. The pod's first container cloud-controller-clock fails with the following error:

rake aborted!
NoMethodError: undefined method `[]' for nil:NilClass

To Reproduce Set up single-operator to support multiple namespaces as documented at https://quarks.suse.dev/docs/quarks-operator/install/

Expected behavior All environments should be up and running for kubecf v2.6.1 and managed by a single cf-operator

Environment

Additional context Installation is done on OpenShift version 4.4

cf-gitbot commented 3 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/175611515

The labels on this github issue will be updated when the story is started.

manno commented 3 years ago

This is being discussed on slack: https://cloudfoundry.slack.com/archives/C1BQKKNP4/p1604340015095600

divyaaswath commented 3 years ago

Yes @manno I am aware of that. Have provided the logs and issue details in the slack as well but have not received a solution yet. If there is a workaround which is available for us to use that would help too. Please let me know.

mudler commented 3 years ago

I can't reproduce the issue here. Running a k3s cluster with multiple KubeCF just fine. I had to workaround some issues present in KubeCF, like https://github.com/cloudfoundry-incubator/kubecf/issues/1582 , but here is my cluster state:

NAMESPACE       NAME                                                      READY   STATUS      RESTARTS   AGE
kube-system     metrics-server-7b4f8b595-hh8qg                            1/1     Running     0          20h
nginx-ingress   svclb-nginx-ingress-ingress-nginx-controller-hm5ss        2/2     Running     0          20h
nginx-ingress   svclb-nginx-ingress-ingress-nginx-controller-zq5x4        2/2     Running     0          20h
kube-system     local-path-provisioner-7ff9579c6-fmbzg                    1/1     Running     2          20h
nginx-ingress   nginx-ingress-ingress-nginx-controller-77d97c57b6-qhn8n   1/1     Running     0          20h
kube-system     coredns-66c464876b-942kq                                  1/1     Running     0          20h
nginx-ingress   svclb-nginx-ingress-ingress-nginx-controller-mr8hg        2/2     Running     2          20h
cf-operator     cf-operator-quarks-job-556455b9ff-x85xc                   1/1     Running     0          30m
cf-operator     cf-operator-quarks-secret-66856b4648-lbglv                1/1     Running     0          30m
cf-operator     cf-operator-6db597568b-vcvrz                              1/1     Running     0          30m
kubecf          bosh-dns-7b59bdd66d-w4488                                 1/1     Running     0          27m
kubecf          bosh-dns-7b59bdd66d-grlv8                                 1/1     Running     0          27m
kubecf          cf-apps-dns-dcb9687ff-f4stn                               1/1     Running     0          29m
kubecf          database-0                                                2/2     Running     0          27m
kubecf          database-seeder-35e960a317320783-vjthw                    0/2     Completed   0          27m
kubecf          doppler-0                                                 6/6     Running     0          25m
kubecf          nats-0                                                    7/7     Running     0          25m
kubecf          diego-api-0                                               9/9     Running     2          25m
kubecf          log-api-0                                                 9/9     Running     0          25m
kubecf          auctioneer-0                                              6/6     Running     1          25m
kubecf          singleton-blobstore-0                                     8/8     Running     0          25m
kubecf          uaa-0                                                     9/9     Running     0          25m
kubecf          tcp-router-0                                              7/7     Running     0          25m
kubecf          routing-api-0                                             6/6     Running     0          25m
kubecf          log-cache-0                                               10/10   Running     0          25m
kubecf          api-0                                                     17/17   Running     1          25m
kubecf          router-0                                                  7/7     Running     1          25m
kubecf          cc-worker-0                                               6/6     Running     0          25m
kubecf          credhub-0                                                 8/8     Running     0          25m
kubecf          scheduler-0                                               13/13   Running     1          25m
kubecf          diego-cell-0                                              12/12   Running     2          25m
foo             bosh-dns-7b59bdd66d-slzgr                                 1/1     Running     0          17m
foo             cf-apps-dns-6497db99c5-2w5j2                              1/1     Running     0          19m
foo             bosh-dns-7b59bdd66d-hrhzw                                 1/1     Running     0          17m
foo             database-0                                                2/2     Running     0          17m
foo             database-seeder-3f3ba967274250e9-q7vj7                    0/2     Completed   0          17m
foo             doppler-0                                                 6/6     Running     0          15m
foo             nats-0                                                    7/7     Running     0          15m
foo             diego-api-0                                               9/9     Running     2          15m
foo             log-api-0                                                 9/9     Running     0          15m
foo             singleton-blobstore-0                                     8/8     Running     0          15m
foo             auctioneer-0                                              6/6     Running     1          15m
foo             uaa-0                                                     9/9     Running     0          15m
foo             routing-api-0                                             6/6     Running     0          15m
foo             tcp-router-0                                              7/7     Running     0          15m
foo             log-cache-0                                               10/10   Running     0          15m
foo             router-0                                                  7/7     Running     2          15m
foo             credhub-0                                                 8/8     Running     0          15m
foo             api-0                                                     17/17   Running     1          15m
foo             cc-worker-0                                               6/6     Running     0          15m
foo             scheduler-0                                               13/13   Running     1          15m
foo             diego-cell-0                                              12/12   Running     13         15m

@divyaaswath could be related to the rolebinding setup? can you show how are you deploying KubeCF in different namespaces?

divyaaswath commented 3 years ago

@mudler I also faced the ClusterRole issue which you have reported above, but just overrode the annotation for cluster role for the subsequent environment and proceeded further. Also, as per the documentation, there is a need for us to create a namespace with specific labels, service account in that namespace and role binding for each of the namespace where we want kubecf to be deployed. So here is the set of details which gets run for every environment (xxx keeps changing for every env):

cat <<_EOF_ | oc create -f -
apiVersion: v1
kind: Namespace
metadata:
  name: xxx
  labels:
    quarks.cloudfoundry.org/monitored: cfo
    quarks.cloudfoundry.org/qjob-service-account: qjob-persist-output
spec:
  finalizers:
  - kubernetes
_EOF_

cat <<_EOF_ | oc create -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: qjob-persist-output
  namespace: xxx
_EOF_

cat <<_EOF_ | oc create -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: qjob-persist-output-xxx
  namespace: xxx
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: qjob-persist-output
subjects:
- kind: ServiceAccount
  name: qjob-persist-output
_EOF_

Let me know if there is an issue with these settings.. Thanks! Also, can you push any app to both environments?? Let me know..

mudler commented 3 years ago

I can reproduce the issue with Eirini enabled. Diego is not affected as far as I can tell. To note, in my case was enough to setup the quarks-operator with more than one namespace, and deploy on the first one. The ruby error stack trace points to https://github.com/cloudfoundry/cloud_controller_ng/blob/master/lib/cloud_controller/opi/apps_client.rb#L23 , but I have inspected the cloud-controller-clock container and seems to have the correct opi configuration endpoints.

I've also tried to contact opi from a different container in the same pod, and dns was working as intended

I'm debugging further now and checking what's the difference with a deployment on a single namespace, but from the quarks-operator perspective shouldn't matter. So I start to suspect must be something not tuned correctly on KubeCF side.

mudler commented 3 years ago

here is the full stacktrace :

/:/var/vcap/jobs/cloud_controller_clock# /var/vcap/jobs/cloud_controller_clock/bin/cloud_controller_clock                                                                                                                                                                      
I, [2020-11-18T11:58:36.656928 #939]  INFO -- : Starting clock for 17 events: [ app_usage_events.job audit_events.job failed_jobs.job service_usage_events.job completed_tasks.job expired_blob_cleanup.job expired_resource_cleanup.job expired_orphaned_blob_cleanup.job orph
aned_blobs_cleanup.job pollable_job_cleanup.job request_counts_cleanup.job prune_completed_deployments.job prune_completed_builds.job prune_excess_app_revisions.job pending_droplets.job pending_builds.job diego_sync.job ]                                                  
I, [2020-11-18T11:58:36.657512 #939]  INFO -- : Triggering 'pending_droplets.job'                                                                                                                                                                                              
I, [2020-11-18T11:58:36.663295 #939]  INFO -- : Triggering 'pending_builds.job'                                                                                                                                                                                                
I, [2020-11-18T11:58:36.668294 #939]  INFO -- : Triggering 'diego_sync.job'                                                                                                                                                                                                    
#<HTTP::Message:0x000056307b4d66b8>                                                                                                                                                                                                                                            
E, [2020-11-18T11:58:36.882341 #939] ERROR -- : undefined method `[]' for nil:NilClass (NoMethodError)                                                                                                                                                                         
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/opi/apps_client.rb:26:in `fetch_scheduling_infos'                                                                                                                                              
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb:22:in `sync'                                                                                                                                                           
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/diego/sync.rb:17:in `block in perform'                                                                                                                                                                     
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/statsd-ruby-1.4.0/lib/statsd.rb:412:in `time'                                                                                                                                                                  
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/diego/sync.rb:16:in `perform'                                                                                                                                                                              
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/wrapping_job.rb:11:in `perform'                                                                                                                                                                            
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/timeout_job.rb:13:in `block in perform'                                                                                                                                                                    
/var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:93:in `block in timeout'                                        
/var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:33:in `block in catch'                                               
/var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:33:in `catch'                                                                                                                                                                                                  
/var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:33:in `catch'                                                          
/var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:108:in `timeout'                                                                                                                                                                                               
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/timeout_job.rb:12:in `perform'                                    
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:81:in `block in invoke_job'                                                                                                                                      
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:61:in `block in initialize'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:66:in `execute'       
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:40:in `run_callbacks'                                                                                                                                               
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:78:in `invoke_job'      
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:19:in `block (2 levels) in enqueue_job'                                                                                                                          
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:61:in `block in initialize'                                                                                                                                         
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:66:in `execute'                                                                                                                                                     
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:40:in `run_callbacks'       
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:17:in `block in enqueue_job'                                                                                                                                     
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:16:in `tap'        
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:16:in `enqueue_job'                                                                                                                                              
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:12:in `enqueue'                                                                                                                                                  
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/enqueuer.rb:31:in `block in run_inline'                           
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/enqueuer.rb:56:in `run_immediately'                                                                                                                                                                        
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/enqueuer.rb:30:in `run_inline'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/clock.rb:51:in `block in schedule_frequent_inline_job'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/clock.rb:58:in `block in schedule_job'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_scheduler.rb:12:in `block (2 levels) in schedule_periodic_job'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_executor.rb:30:in `execute_job'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_scheduler.rb:12:in `block in schedule_periodic_job'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/clockwork-2.0.4/lib/clockwork/event.rb:58:in `execute'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/clockwork-2.0.4/lib/clockwork/event.rb:41:in `block in run'
#<Thread:0x00007f029001dbc0@/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/clockwork-2.0.4/lib/clockwork/event.rb:40 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
        35: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/clockwork-2.0.4/lib/clockwork/event.rb:41:in `block in run'
        34: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/clockwork-2.0.4/lib/clockwork/event.rb:58:in `execute' 
        33: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_scheduler.rb:12:in `block in schedule_periodic_job'
        32: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_executor.rb:30:in `execute_job'
        31: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_scheduler.rb:12:in `block (2 levels) in schedule_periodic_job'
        30: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/clock.rb:58:in `block in schedule_job'
        29: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/clock.rb:51:in `block in schedule_frequent_inline_job'
        28: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/enqueuer.rb:30:in `run_inline'
        27: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/enqueuer.rb:56:in `run_immediately'
        26: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/enqueuer.rb:31:in `block in run_inline'
        25: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:12:in `enqueue'
        24: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:16:in `enqueue_job'
        23: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:16:in `tap'
        22: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:17:in `block in enqueue_job'
        21: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:40:in `run_callbacks'
        20: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:66:in `execute'
        19: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:61:in `block in initialize'
        18: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:19:in `block (2 levels) in enqueue_job'
        17: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:78:in `invoke_job'
        16: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:40:in `run_callbacks'
        15: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:66:in `execute'
        14: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:61:in `block in initialize'
        13: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:81:in `block in invoke_job'
        12: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/timeout_job.rb:12:in `perform'
        11: from /var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:108:in `timeout'
        10: from /var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:33:in `catch'
         9: from /var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:33:in `catch'
         8: from /var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:33:in `block in catch'
         7: from /var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:93:in `block in timeout'
         6: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/timeout_job.rb:13:in `block in perform'
         5: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/wrapping_job.rb:11:in `perform'
         4: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/diego/sync.rb:16:in `perform'
         3: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/statsd-ruby-1.4.0/lib/statsd.rb:412:in `time'
         2: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/diego/sync.rb:17:in `block in perform'
         1: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb:22:in `sync'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/opi/apps_client.rb:26:in `fetch_scheduling_infos': undefined method `[]' for nil:NilClass (NoMethodError)
rake aborted!
NoMethodError: undefined method `[]' for nil:NilClass
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/opi/apps_client.rb:26:in `fetch_scheduling_infos'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb:22:in `sync'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/diego/sync.rb:17:in `block in perform'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/statsd-ruby-1.4.0/lib/statsd.rb:412:in `time'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/diego/sync.rb:16:in `perform'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/wrapping_job.rb:11:in `perform'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/timeout_job.rb:13:in `block in perform'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/timeout_job.rb:12:in `perform'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:81:in `block in invoke_job'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:61:in `block in initialize'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:66:in `execute'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:40:in `run_callbacks'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:78:in `invoke_job'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/enqueuer.rb:56:in `run_immediately'                                                                                                                                                                        
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/enqueuer.rb:30:in `run_inline'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/clock.rb:51:in `block in schedule_frequent_inline_job'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/clock.rb:58:in `block in schedule_job'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_scheduler.rb:12:in `block (2 levels) in schedule_periodic_job'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_executor.rb:30:in `execute_job'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_scheduler.rb:12:in `block in schedule_periodic_job'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/clockwork-2.0.4/lib/clockwork/event.rb:58:in `execute'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/clockwork-2.0.4/lib/clockwork/event.rb:41:in `block in run'
#<Thread:0x00007f029001dbc0@/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/clockwork-2.0.4/lib/clockwork/event.rb:40 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
        35: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/clockwork-2.0.4/lib/clockwork/event.rb:41:in `block in run'
        34: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/clockwork-2.0.4/lib/clockwork/event.rb:58:in `execute' 
        33: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_scheduler.rb:12:in `block in schedule_periodic_job'
        32: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_executor.rb:30:in `execute_job'
        31: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/distributed_scheduler.rb:12:in `block (2 levels) in schedule_periodic_job'
        30: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/clock.rb:58:in `block in schedule_job'
        29: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/clock/clock.rb:51:in `block in schedule_frequent_inline_job'
        28: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/enqueuer.rb:30:in `run_inline'
        27: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/enqueuer.rb:56:in `run_immediately'
        26: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/enqueuer.rb:31:in `block in run_inline'
        25: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:12:in `enqueue'
        24: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:16:in `enqueue_job'
        23: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:16:in `tap'
        22: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:17:in `block in enqueue_job'
        21: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:40:in `run_callbacks'
        20: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:66:in `execute'
        19: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:61:in `block in initialize'
        18: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:19:in `block (2 levels) in enqueue_job'
        17: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:78:in `invoke_job'
        16: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:40:in `run_callbacks'
        15: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:66:in `execute'
        14: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:61:in `block in initialize'
        13: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:81:in `block in invoke_job'
        12: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/timeout_job.rb:12:in `perform'
        11: from /var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:108:in `timeout'
        10: from /var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:33:in `catch'
         9: from /var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:33:in `catch'
         8: from /var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:33:in `block in catch'
         7: from /var/vcap/packages/ruby-2.5.5-r0.10.0/lib/ruby/2.5.0/timeout.rb:93:in `block in timeout'
         6: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/timeout_job.rb:13:in `block in perform'
         5: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/wrapping_job.rb:11:in `perform'
         4: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/diego/sync.rb:16:in `perform'
         3: from /var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/statsd-ruby-1.4.0/lib/statsd.rb:412:in `time'
         2: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/diego/sync.rb:17:in `block in perform'
         1: from /var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb:22:in `sync'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/opi/apps_client.rb:26:in `fetch_scheduling_infos': undefined method `[]' for nil:NilClass (NoMethodError)
rake aborted!
NoMethodError: undefined method `[]' for nil:NilClass
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/opi/apps_client.rb:26:in `fetch_scheduling_infos'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb:22:in `sync'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/diego/sync.rb:17:in `block in perform'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/statsd-ruby-1.4.0/lib/statsd.rb:412:in `time'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/diego/sync.rb:16:in `perform'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/wrapping_job.rb:11:in `perform'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/timeout_job.rb:13:in `block in perform'
/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/app/jobs/timeout_job.rb:12:in `perform'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:81:in `block in invoke_job'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:61:in `block in initialize'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:66:in `execute'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/lifecycle.rb:40:in `run_callbacks'
/var/vcap/packages/cloud_controller_ng/gem_home/ruby/2.5.0/gems/delayed_job-4.1.8/lib/delayed/backend/base.rb:78:in `invoke_job'
mudler commented 3 years ago

After debugging with @manno we found out that the cc-worker receives an internal server error from opi and when this happens in the Eirini pod we can see in the logs:

W1118 12:45:49.023665       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
{"timestamp":"2020-11-18T12:45:49.026286043Z","level":"info","source":"handler","message":"handler.opi-connected","data":{}}
{"timestamp":"2020-11-18T12:47:27.541348858Z","level":"debug","source":"handler","message":"handler.list-apps.requested","data":{"session":"2"}}
{"timestamp":"2020-11-18T12:47:27.690538039Z","level":"error","source":"desirer","message":"desirer.list.failed-to-list-statefulsets","data":{"error":"statefulsets.apps is forbidden: User \"system:serviceaccount:kubecf:opi\" cannot list resource \"statefulsets\" in API group \"apps\" at the cluster scope","session":"1"}}
{"timestamp":"2020-11-18T12:47:27.690665343Z","level":"error","source":"handler","message":"handler.list-apps.bifrost-failed","data":{"error":"failed to list desired LRPs: failed to list statefulsets: statefulsets.apps is forbidden: User \"system:serviceaccount:kubecf:opi\" cannot list resource \"statefulsets\" in API group \"apps\" at the cluster scope","session":"2"}}

Looks like the cluster role configuration needed is in the kubecf namespace instead of the eirini one, see: https://github.com/cloudfoundry-incubator/kubecf/blob/master/mixins/eirini/templates/eirini-cluster-role.yaml#L66 .

@divyaaswath I've opened https://github.com/cloudfoundry-incubator/kubecf/issues/1602 to track the bug in KubeCF, and will close this as this sounds a configuration issue rather than a Quarks bug.

divyaaswath commented 3 years ago

Thanks @mudler !!