goharbor / harbor-scanner-clair

Use Clair as a plug-in vulnerability scanner in the Harbor registry
https://goharbor.io/
Apache License 2.0
35 stars 26 forks source link

feature(redis) support redis sentinel #20

Closed bitsf closed 4 years ago

bitsf commented 4 years ago

fix #19

the full raw url is

  redis://<hostname>:<post>/<dbnum>
  redis+sentinel://<hostname1>:<port>,<hostname2>:<port>,<hostname2>:<port>/<monitor_name>/<dbnum>

reference https://github.com/goharbor/harbor/pull/11155

bitsf commented 4 years ago

hi @danielpacak Actually I submit this PR here is mainly for your reference, that how I enable redis sentinel support, and speed up the fix progress, as I'm not quite familiar with clair and trivy adapter code. And I don't know how to test this too. I suggest you just code it as your favorite, that would be more effective.

The main idea here you should confirm is whether we can get agreement of the redis url style that I will update in the configure file in harbor.

 redis://<hostname>:<post>/<dbnum>
 redis+sentinel://<hostname1>:<port>,<hostname2>:<port>,<hostname2>:<port>/<monitor_name>/<dbnum>
danielpacak commented 4 years ago

hi @danielpacak Actually I submit this PR here is mainly for your reference, that how I enable redis sentinel support, and speed up the fix progress, as I'm not quite familiar with clair and trivy adapter code. And I don't know how to test this too. I suggest you just code it as your favorite, that would be more effective.

The main idea here you should confirm is whether we can get agreement of the redis url style that I will update in the configure file in harbor.

 redis://<hostname>:<post>/<dbnum>
 redis+sentinel://<hostname1>:<port>,<hostname2>:<port>,<hostname2>:<port>/<monitor_name>/<dbnum>

Hey @bitsf I thought that it's ready to merge code tested against a real Redis Sentinel cluster. I reviewed it from such stand point. Anyway now that I understand the idea of Redis URL schemas that we want to support I can follow up and test it properly.

codecov[bot] commented 4 years ago

Codecov Report

Merging #20 into master will decrease coverage by 2.88%. The diff coverage is 26.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #20      +/-   ##
==========================================
- Coverage   50.35%   47.47%   -2.89%     
==========================================
  Files          13       14       +1     
  Lines         556      634      +78     
==========================================
+ Hits          280      301      +21     
- Misses        259      315      +56     
- Partials       17       18       +1     
Impacted Files Coverage Δ
pkg/etc/config.go 44.44% <ø> (ø)
pkg/redisx/pool.go 26.92% <26.92%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5daec06...be00071. Read the comment docs.

danielpacak commented 4 years ago

@bitsf @steven-zou @heww

I refactored a bit the code initially submitted by @bitsf . I tested the changes with the real Sentinel deployment and confirmed that the failover scenario is handled by the connection pool as expected.

I used Bitnami Helm chart for Redis Sentinel to deploy Redis Sentinel on Kubernetes.

One thing that I noticed is that the redigo lib does not allow us to declare any connection retry mechanism, which is necessary if we want the adapter service to automatically recover from the master failover.

Do we want to implement such retry strategy in the adapter service, or do we just fail and wait for Harbor scan job to retry. This is all about retrying connections to Redis and backing off during the time when slaves do elect new master.

danielpacak commented 4 years ago

@bitsf @steven-zou @heww

I refactored a bit the code initially submitted by @bitsf . I tested the changes with the real Sentinel deployment and confirmed that the failover scenario is handled by the connection pool as expected.

I used Bitnami Helm chart for Redis Sentinel to deploy Redis Sentinel on Kubernetes.

One thing that I noticed is that the redigo lib does not allow us to declare any connection retry mechanism, which is necessary if we want the adapter service to automatically recover from the master failover.

Do we want to implement such retry strategy in the adapter service, or do we just fail and wait for Harbor scan job to retry. This is all about retrying connections to Redis and backing off during the time when slaves do elect new master.

If we want to go for retry/ circuit breaker in the adapter service I'd consider replacing redigo with https://github.com/go-redis/redis

bitsf commented 4 years ago

There're some redis cluster I'm using for test.

  1. for docker https://github.com/AliyunContainerService/redis-cluster.git
  2. for helm helm install test1 bitnami/redis --set sentinel.enabled=true --set password=Test123
  3. for operator https://github.com/spotahome/redis-operator
bitsf commented 4 years ago

@bitsf @steven-zou @heww I refactored a bit the code initially submitted by @bitsf . I tested the changes with the real Sentinel deployment and confirmed that the failover scenario is handled by the connection pool as expected.

I used Bitnami Helm chart for Redis Sentinel to deploy Redis Sentinel on Kubernetes.

One thing that I noticed is that the redigo lib does not allow us to declare any connection retry mechanism, which is necessary if we want the adapter service to automatically recover from the master failover. Do we want to implement such retry strategy in the adapter service, or do we just fail and wait for Harbor scan job to retry. This is all about retrying connections to Redis and backing off during the time when slaves do elect new master.

If we want to go for retry/ circuit breaker in the adapter service I'd consider replacing redigo with https://github.com/go-redis/redis

@danielpacak I'm not sure what you mean about the redigo problem. In my test, when I shutdown the master, out code can failover to connect to the slave after a little delay. Can you give more detail test scenario.

danielpacak commented 4 years ago

@bitsf @steven-zou @heww I refactored a bit the code initially submitted by @bitsf . I tested the changes with the real Sentinel deployment and confirmed that the failover scenario is handled by the connection pool as expected.

I used Bitnami Helm chart for Redis Sentinel to deploy Redis Sentinel on Kubernetes.

One thing that I noticed is that the redigo lib does not allow us to declare any connection retry mechanism, which is necessary if we want the adapter service to automatically recover from the master failover. Do we want to implement such retry strategy in the adapter service, or do we just fail and wait for Harbor scan job to retry. This is all about retrying connections to Redis and backing off during the time when slaves do elect new master.

If we want to go for retry/ circuit breaker in the adapter service I'd consider replacing redigo with https://github.com/go-redis/redis

@danielpacak I'm not sure what you mean about the redigo problem. In my test, when I shutdown the master, out code can failover to connect to the slave after a little delay. Can you give more detail test scenario.

@bitsf I do confirm that redigo and sentinel pool implemented in this PR work fine. We also handle the failover case, i.e. the adapter service can reconnect without restarting the container.

I was more referring to the case, when master is down and sentinels are electing new master. This is a time window, when this line of code might be executed https://github.com/goharbor/harbor-scanner-clair/blob/master/pkg/persistence/redis/store.go#L52

If that's the case, and we do not currently have any retry code, hence the scan job would fail.

bitsf commented 4 years ago

@danielpacak Do you mean when user push an image, and just redis failover, the clair scan will fail, and the image would not be scanned any more? Can user manually scan later or the period scan fix this ? If yes, I think this would be enough, as failover is just a corner case, we can't make it perfect work all that time, our priority target is to make the system can fully recovery after a short time unavailable. We may also schedule a zoom meeting if this need further discussion.

danielpacak commented 4 years ago

@danielpacak Do you mean when user push an image, and just redis failover, the clair scan will fail, and the image would not be scanned any more? Can user manually scan later or the period scan fix this ? If yes, I think this would be enough, as failover is just a corner case, we can't make it perfect work all that time, our priority target is to make the system can fully recovery after a short time unavailable. We may also schedule a zoom meeting if this need further discussion.

@bitsf Yes, that's the case I was thinking about. I do also agree with you that manual / scheduled rescan after recovery is acceptable solution. What's more, IIRC Harbor itself does automatic retry, which might fix a failing job caused by Redis failover.

bitsf commented 4 years ago

hi @danielpacak will you add parsing the optional param ?sentinel_password=xxx in the url this PR

danielpacak commented 4 years ago

hi @danielpacak will you add parsing the optional param ?sentinel_password=xxx in the url this PR

Hey @bitsf . Yes. I can do it in this PR

danielpacak commented 4 years ago

@bitsf I've updated the code and the PR is ready for review. Beyond that, I run the following tests:

  1. Installed Sentinel with password for master/worker nodes disabled / enabled

    $ helm install redis bitnami/redis -n redis \
      --set sentinel.enabled=true \
      --set sentinel.usePassword=false \
      --set usePassword=true \
      --set password=Redis12345
  2. I was using the following URLs in cases password was disabled / enabled

    • redis+sentinel://:@redis.redis:26379/mymaster
    • redis+sentinel://:Redis12345@redis.redis:26379/mymaster
  3. In both cases I tested the failover by scaling down the master node with the kubectl -n redis sts redis-master --replicas 0 command

  4. Everything worked as expected and workers were able to elect a new master. In case of failover, after a few seconds the scanner adapter was able to connect to Sentinel with updated master node without restarting Pods

bitsf commented 4 years ago

@danielpacak There're too many commits, should squash to one.