dragonflyoss / Dragonfly2

Dragonfly is an open source P2P-based file distribution and image acceleration system. It is hosted by the Cloud Native Computing Foundation (CNCF) as an Incubating Level Project.
https://d7y.io
Apache License 2.0
2.28k stars 292 forks source link

Broker in scheduler unable to connect to Redis #2496

Open PKizzle opened 1 year ago

PKizzle commented 1 year ago

Bug report:

The broker in the scheduler logs a connection error signalling that it is unable to authenticate with the Redis server. This is the error message that is repeated over and over again: v1/worker.go:84 Broker failed with error: ERR AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct?

Expected behavior:

Authentication succeeds and no error message is logged.

How to reproduce it:

  1. Run a Redis server behind a sentinel where the server has a password set for the default set but there is not password set for the sentinel
  2. Configure dragonfly using the official helm chart and specify the Redis password (set no username)
  3. Check the scheduler logs (might need to manipulate the helm chart to make the scheduler run with --console argument)

Environment:

gaius-qi commented 1 year ago

@PKizzle Please send me the launch configuration for helm charts, thx.

PKizzle commented 1 year ago

These are all the changes I have made to the values.yaml included with the original helm chart. I have redacted all sensitive information with shell script style variables.

scheduler:
  config:
    seedPeer:
      enable: false
  metrics:
    prometheusRule:
      enable: true
seedPeer:
  enable: false
dfdaemon:
  console: true
  download:
    totalRateLimit: 40Mi
    perPeerRateLimit: 20Mi
  upload:
    rateLimit: 20Mi
  objectStorage:
    enable: true
    maxReplicas: 1
  storage: 
    taskExpireTime: 3h
    strategy: io.d7y.storage.v2.advance
    diskGCThreshold: 4Gi
  network:
    enableIPv6: true
manager:
  ingress:
    enable: true
    className: "haproxy-internal"
    annotations:
      haproxy.org/ssl-redirect: "true"
    hosts:
      - "${DRAGONFLY_DOMAIN}"
  config:
    auth:
      jwt:
        key: "${JWT_KEY}"
    objectStorage:
      enable: true
      endpoint: "${S3_DOMAIN}"
      accessKey: "${AWS_ACCESS_KEY_ID}"
      secretKey: "${AWS_SECRET_ACCESS_KEY}"
    network:
      enableIPv6: true
    console: true
mysql:
  enable: false

# Custom addition to helm chart for postgres support
externalPostgres:
  migrate: true
  host: "${PGHOST}"
  username: "${PGUSER}"
  password: "${PGPASSWORD}"
  database: "dragonfly"
  port: 5432
  sslMode: require

redis:
  enable: false

externalRedis:
  addrs:
    - "rfs-dragonfly-redis.kube-system.svc.cluster.local:26379"
  masterName: "mymaster"
  username: null
  password: "${REDIS_PASSWORD}"
  db: 0
  brokerDB: 0
  backendDB: 0
  networkTopologyDB: 0
PKizzle commented 1 year ago

Also, the dependency used for the async (job) queue is no longer maintained: https://github.com/RichardKnop/machinery/issues/790 I guess that this bug is connected with the usage of the abandoned project.

PKizzle commented 1 year ago

After taking a closer look at the machinery source code sentinel support is only provided for the go-redis implementation. The factor that decides whether redigo or go-redis are used is the number of broker addresses. Two or more addresses leads to the go-redis implementation being used. Thus adding an additional empty address in the values.yaml file fixes the issue.

I do not see any reason why machinery is using two different redis implementations and therefore highly recommend to switch to a different job queue dependency that does not introduce this kind of complexity.