Closed bitsf closed 4 years ago
hi @danielpacak Actually I submit this PR here is mainly for your reference, that how I enable redis sentinel support, and speed up the fix progress, as I'm not quite familiar with clair and trivy adapter code. And I don't know how to test this too. I suggest you just code it as your favorite, that would be more effective.
The main idea here you should confirm is whether we can get agreement of the redis url style that I will update in the configure file in harbor.
redis://<hostname>:<post>/<dbnum>
redis+sentinel://<hostname1>:<port>,<hostname2>:<port>,<hostname2>:<port>/<monitor_name>/<dbnum>
hi @danielpacak Actually I submit this PR here is mainly for your reference, that how I enable redis sentinel support, and speed up the fix progress, as I'm not quite familiar with clair and trivy adapter code. And I don't know how to test this too. I suggest you just code it as your favorite, that would be more effective.
The main idea here you should confirm is whether we can get agreement of the redis url style that I will update in the configure file in harbor.
redis://<hostname>:<post>/<dbnum> redis+sentinel://<hostname1>:<port>,<hostname2>:<port>,<hostname2>:<port>/<monitor_name>/<dbnum>
Hey @bitsf I thought that it's ready to merge code tested against a real Redis Sentinel cluster. I reviewed it from such stand point. Anyway now that I understand the idea of Redis URL schemas that we want to support I can follow up and test it properly.
Merging #20 into master will decrease coverage by
2.88%
. The diff coverage is26.92%
.
@@ Coverage Diff @@
## master #20 +/- ##
==========================================
- Coverage 50.35% 47.47% -2.89%
==========================================
Files 13 14 +1
Lines 556 634 +78
==========================================
+ Hits 280 301 +21
- Misses 259 315 +56
- Partials 17 18 +1
Impacted Files | Coverage Δ | |
---|---|---|
pkg/etc/config.go | 44.44% <ø> (ø) |
|
pkg/redisx/pool.go | 26.92% <26.92%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 5daec06...be00071. Read the comment docs.
@bitsf @steven-zou @heww
I refactored a bit the code initially submitted by @bitsf . I tested the changes with the real Sentinel deployment and confirmed that the failover scenario is handled by the connection pool as expected.
I used Bitnami Helm chart for Redis Sentinel to deploy Redis Sentinel on Kubernetes.
One thing that I noticed is that the redigo lib does not allow us to declare any connection retry mechanism, which is necessary if we want the adapter service to automatically recover from the master failover.
Do we want to implement such retry strategy in the adapter service, or do we just fail and wait for Harbor scan job to retry. This is all about retrying connections to Redis and backing off during the time when slaves do elect new master.
@bitsf @steven-zou @heww
I refactored a bit the code initially submitted by @bitsf . I tested the changes with the real Sentinel deployment and confirmed that the failover scenario is handled by the connection pool as expected.
I used Bitnami Helm chart for Redis Sentinel to deploy Redis Sentinel on Kubernetes.
One thing that I noticed is that the redigo lib does not allow us to declare any connection retry mechanism, which is necessary if we want the adapter service to automatically recover from the master failover.
Do we want to implement such retry strategy in the adapter service, or do we just fail and wait for Harbor scan job to retry. This is all about retrying connections to Redis and backing off during the time when slaves do elect new master.
If we want to go for retry/ circuit breaker in the adapter service I'd consider replacing redigo with https://github.com/go-redis/redis
There're some redis cluster I'm using for test.
@bitsf @steven-zou @heww I refactored a bit the code initially submitted by @bitsf . I tested the changes with the real Sentinel deployment and confirmed that the failover scenario is handled by the connection pool as expected.
I used Bitnami Helm chart for Redis Sentinel to deploy Redis Sentinel on Kubernetes.
One thing that I noticed is that the redigo lib does not allow us to declare any connection retry mechanism, which is necessary if we want the adapter service to automatically recover from the master failover. Do we want to implement such retry strategy in the adapter service, or do we just fail and wait for Harbor scan job to retry. This is all about retrying connections to Redis and backing off during the time when slaves do elect new master.
If we want to go for retry/ circuit breaker in the adapter service I'd consider replacing redigo with https://github.com/go-redis/redis
@danielpacak I'm not sure what you mean about the redigo problem. In my test, when I shutdown the master, out code can failover to connect to the slave after a little delay. Can you give more detail test scenario.
@bitsf @steven-zou @heww I refactored a bit the code initially submitted by @bitsf . I tested the changes with the real Sentinel deployment and confirmed that the failover scenario is handled by the connection pool as expected.
I used Bitnami Helm chart for Redis Sentinel to deploy Redis Sentinel on Kubernetes.
One thing that I noticed is that the redigo lib does not allow us to declare any connection retry mechanism, which is necessary if we want the adapter service to automatically recover from the master failover. Do we want to implement such retry strategy in the adapter service, or do we just fail and wait for Harbor scan job to retry. This is all about retrying connections to Redis and backing off during the time when slaves do elect new master.
If we want to go for retry/ circuit breaker in the adapter service I'd consider replacing redigo with https://github.com/go-redis/redis
@danielpacak I'm not sure what you mean about the redigo problem. In my test, when I shutdown the master, out code can failover to connect to the slave after a little delay. Can you give more detail test scenario.
@bitsf I do confirm that redigo and sentinel pool implemented in this PR work fine. We also handle the failover case, i.e. the adapter service can reconnect without restarting the container.
I was more referring to the case, when master is down and sentinels are electing new master. This is a time window, when this line of code might be executed https://github.com/goharbor/harbor-scanner-clair/blob/master/pkg/persistence/redis/store.go#L52
If that's the case, and we do not currently have any retry code, hence the scan job would fail.
@danielpacak Do you mean when user push an image, and just redis failover, the clair scan will fail, and the image would not be scanned any more? Can user manually scan later or the period scan fix this ? If yes, I think this would be enough, as failover is just a corner case, we can't make it perfect work all that time, our priority target is to make the system can fully recovery after a short time unavailable. We may also schedule a zoom meeting if this need further discussion.
@danielpacak Do you mean when user push an image, and just redis failover, the clair scan will fail, and the image would not be scanned any more? Can user manually scan later or the period scan fix this ? If yes, I think this would be enough, as failover is just a corner case, we can't make it perfect work all that time, our priority target is to make the system can fully recovery after a short time unavailable. We may also schedule a zoom meeting if this need further discussion.
@bitsf Yes, that's the case I was thinking about. I do also agree with you that manual / scheduled rescan after recovery is acceptable solution. What's more, IIRC Harbor itself does automatic retry, which might fix a failing job caused by Redis failover.
hi @danielpacak will you add parsing the optional param ?sentinel_password=xxx
in the url this PR
hi @danielpacak will you add parsing the optional param
?sentinel_password=xxx
in the url this PR
Hey @bitsf . Yes. I can do it in this PR
@bitsf I've updated the code and the PR is ready for review. Beyond that, I run the following tests:
Installed Sentinel with password for master/worker nodes disabled / enabled
$ helm install redis bitnami/redis -n redis \
--set sentinel.enabled=true \
--set sentinel.usePassword=false \
--set usePassword=true \
--set password=Redis12345
I was using the following URLs in cases password was disabled / enabled
redis+sentinel://:@redis.redis:26379/mymaster
redis+sentinel://:Redis12345@redis.redis:26379/mymaster
In both cases I tested the failover by scaling down the master node with the kubectl -n redis sts redis-master --replicas 0
command
Everything worked as expected and workers were able to elect a new master. In case of failover, after a few seconds the scanner adapter was able to connect to Sentinel with updated master node without restarting Pods
@danielpacak There're too many commits, should squash to one.
fix #19
the full raw url is
reference https://github.com/goharbor/harbor/pull/11155