Open karaguo opened 2 years ago
A follow up (status update):
Even without the additional changes mentioned at Steps to reproduce the problem
, this issue occurred again. This means that this happens at a low possibility, but makes the system unstable.
My guess is that the postgresql database readiness check at https://github.com/goharbor/harbor-operator/blob/544d6737c197a5fefb9c729ae8785614f77ab005/pkg/cluster/controllers/database/readiness.go#L54 thinks that database is not ready becuase the status is creating. This might be able to be fixed after adding a retry or wait for the status turning from creating to running.
Can someone help confirm?
Update:
To throw the error at following place is supposed to fix the issue. https://github.com/goharbor/harbor-operator/blob/544d6737c197a5fefb9c729ae8785614f77ab005/pkg/cluster/controllers/database/readiness.go#L58
Expected behavior and actual behavior: There is a flaky issue spotted during harbor deployment. HarborCluster became unhealthy (at this moment postgresql and redis pods are ready and harbor components' pods are not initiated), and the only abnormal status when we check the harborCluster is
however, the postgresql's status is RUNNING, and the harbor operator pod also logged as below:
So it shows that there is nothing wrong on postgresql DB.
Restarting harbor operator pod fixed the issue. After restarting the harbor operator pod, the harbor components are deployed immediately and harborCluster becomes healthy. Therefore, it is possible that harbor operator doesn't properly show the status and report an unhealthy status, which might be a false positive. Hi team can you please give more insights on this issue? Also our team found a recent change on pg status check such as https://github.com/goharbor/harbor-operator/pull/476/files. Can you please help take a look and see if this is a regression? Thanks!
Steps to reproduce the problem:
We have not found the connection between the storage class/pvc change and this harbor health issue. However, this flaky issue happened after the change was merged, and didn't occur after it is reverted.
Versions: Please specify the versions of following systems.
Additional context: