Open succa opened 3 weeks ago
@succa This will happen if the scheduler instance is force deleted. Or this situation can also occur if the manager service is unavailable when the scheduler is deleted.
@gaius-qi Thanks for the very quick answer! Is there a fix to it? My scheduler pods are not long live pods due to cluster node rotation
@succa It is necessary to ensure that there are active instances of the manager during the upgrade scheduler process.
@gaius-qi I have 10 running instances all the time. I ended up creating a cronjob to cleanup the database, but this is something you might want to consider adding in the code directly as a safe check by the manager
Bug report:
scheduler database contains same hostname in multiple "active" state
Notice also the strange jump in time between a old entry in error "active" state and subsequent entry This is preventing peers to use this scheduler pod because they are using a wrong ip.
Expected behavior:
There should be only one active entry per host_name at any point in time.
How to reproduce it:
Not able to reproduce, I guess it is happening when the scheduler database is being updated
Environment: