Alluxio / k8s-operator

An operator for managing Alluxio system on Kubernetes cluster
https://www.alluxio.io/
Apache License 2.0
10 stars 8 forks source link

Delete pod before restart #26

Closed ssz1997 closed 1 year ago

ssz1997 commented 1 year ago

Use Recreate for worker restart policy, instead of the default RollingUpdate. The reason is:

In the case where user uses hostPath as worker storage, which is the most common use case because SSD is usually local, we certainly don't want two workers get deployed on the same node because the two workers, sharing the same configuration, will use the same hostPath as storage. Therefore we will need the antiAffinity to make sure "no two workers exist on the same host machine".

Then if the strategy is RollingUpdate, a new worker pod will be created before an old one gets deleted. Then the new one can't successfully start because the old one still exists and we prohibit them on the same machine, unless we have extra machines on which there's no old worker, which is not a safe assumption.

Therefore, we will have to use Recreate, which will delete the old ones before creating new ones. Note that for all-read use case, losing worker supposedly will not cause a failure on the client side, so it is safe to do so.

ssz1997 commented 1 year ago

@Kai-Zhang PTAL. Thanks!

ssz1997 commented 1 year ago

Exactly. Can't just kill it while writing is in progress.