apache / openwhisk

Apache OpenWhisk is an open source serverless cloud platform
https://openwhisk.apache.org/
Apache License 2.0
6.46k stars 1.16k forks source link

Scheduler "stops" after a while #5485

Open markretallack opened 3 months ago

markretallack commented 3 months ago

Environment details:

Steps to reproduce the issue:

  1. Deploy using scheduler etc...
  2. deploy some functions (cron based etc...

Provide the expected results and outputs:

The system works normally

Provide the actual results and outputs:

After a while ( about an hour). The system becomes unstable.

In the controller log:

[2024-05-21T13:00:09.119Z] [ERROR] [#tid_kDcLqa1uLqzR7GhLLdejqhGeS3mbeHQo] [] Failed to recreate queue for dataspace/ncarpark/carpark@0.0.1, no scheduler endpoint available

Also seeing this in the controller log:

[2024-05-21T13:00:06.174Z] [WARN] [#tid_kDcLqa1uLqzR7GhLLdejqhGeS3mbeHQo] [] The whisk/queue/dataspace/dataspace/carpark/carpark/leader is deleted from ETCD, but there are still unhandled activations for this action, try to create a new queue

In the scheduler log I am seeing this:

[2024-05-21T13:00:09.876Z] [WARN] [#tid_sid_unknown] [EtcdWorker] a lease is expired while registering an initial data whisk/queue/dataspace/dataspace/carpark/carpark/leader, reissue it: io.grpc.StatusRuntimeException: NOT_FOUND: etcdserver: requested lease not found

And also:

[2024-05-21T13:00:10.195Z] [WARN] [#tid_sid_unknown] [EtcdWorker] a lease is expired while registering an initial data whisk/scheduler/0, reissue it: io.grpc.StatusRuntimeException: NOT_FOUND: etcdserver: requested lease not found

Not sure where to look for this issue

markretallack commented 3 months ago

My current solution is to disable the new scheduler until I can find the issue.