Open abinet opened 2 months ago
@abinet Thanks for writing it up! I've transferred this issue to "harbor" repo since it's not helm chart specific.
If I check the long running queries on Postgres side, it seems like the queries from different core-replicas just blocking each other
This looks like a dead-lock and should never happen, we'll handle it with priority.
hi @abinet
I am trying to understand the problem, and I've some questions:
hi @wy65701436 , thank you for caring about the issue.
I am using Harbor 2.10.2. Yes, the images are from the same registry but with different tags. The blocked queries are selects from blob/artifact tables. Sometimes its blob, sometimes artifacts. Unfortunately I can only reproduce this issue in Production Instance, so I can not just easily enable debugging there.
Will try to gather more details and update the ticket.
@wy65701436 @reasonerjt, btw, this happens also on single pod harbor core instances. Easy to reproduce, set DB max connections to <10, eg. 5 and try to delete in the UI 100 Images
We have some updates on the issue:
It seems when the core runs out of DB connection, it freezes.
This was observed with 2.9.4 but, it should also happen with the latest Harbor version.
I the diagrams one can see the time before and after the restart of Harbor.
We are using Harbor installation with internal PostgreSQL Database and 3 harbor core replicas running.
When we try to delete multiple (>10) images via Harbor UI with this configuration, the page just hangs and keeps reloading. If I check the long running queries on Postgres side, it seems like the queries from different core-replicas just blocking each other and it is impossible to unblock trying to terminate backends (https://www.shanelynn.ie/postgresql-find-slow-long-running-and-blocked-queries/)
I am not very handy in PostgreSQL operation, thus so far we have to restart database in such cases by restarting database pod.
Down-scaling harbor-core deployment to 1 replica solves the problem. With one replica running multiple images can be deleted from UI without any issues.
I am wondering if somebody is experiencing same issue with external Postgres database as well.