Fix savepoint problems - Githubissues

shashken commented 3 years ago

I found 2 problems related to savepoints:

When upgrading a job, there was no option to take a savepoint before upgrading (and using it to restore) added a flag to fix this case
When a cluster starts it tries to take a savepoint, the savepoint status only updates once it completes, this creates a situation where a new savepoint gets triggered while the previous one is still running, and it keeps happening if your savepoints won't finish quickly (forever) I solved this with another value that holds the savepoint trigger time, and an increased savepoint timeout, so while a savepoint is still running a new one will not get triggered.

@functicons I'd love to get your feedback on this, we might want to create a stronger solution later on but for now savepoints are impossible to use with this operator if they take some time.

functicons commented 3 years ago

/gcbrun

functicons commented 3 years ago

Thanks for the PR, will review as soon as I get a chance.

functicons commented 3 years ago

/gcbrun

GoogleCloudPlatform / flink-on-k8s-operator

Fix savepoint problems #392