helm deployments: in some cases not referencing correct docker image version

oliversommer commented 2 years ago

quick summary affected: helm chart and deployments based on helm chart. issue: docker tag on images not set correctly out of the box when installing older helm chart version

Bug description In the deployment sections for the helm chart, the declarations for the docker images make use of a helm value for the docker tag. Example here. This value is provided statically by using the string literal "latest" here. As a result, when installing older helm charts (older=not the latest version), the latest docker images will be used instead of the one associated with that version.

This can become severe in conjunction with #5993. However, also in #5993 there is a workaround described, for those affected (e.g. reinstalling via helm with an older version)

Mitigations / Workarounds

upgrade to the latest version of defectdojo
set path as described in #5993
in case you want ro run an older version of defectdojo chart: simply set tag for docker image explicitly by providing it via helm value

Steps to reproduce Install older version of helm chart (chart version <= 1.6.28) with rabbitmq enabled (default).

Steps to reproduce the behavior:

install helm chart version 1.6.27
wait for containers to come up
take a look into log output of celery-beat, celery-worker or rabbitmq
See error of login problems, examples provided below (similar to #5993).

Expected behavior System should come up with celery working.

Deployment method (select with an X)

[ ] Docker Compose
[x] Kubernetes
[ ] GoDojo

Environment information

Kubernetes System: k8s v1.21.5-eks-bc4871b
Helm version 3.x
DefectDojo helm chart version: 1.6.27 (but versions prior to 1.6.29 likely to be affected)
DefectDojo version: 2.7.1

Logs celery-beat:

LocalTime -> 2022-03-10 11:52:50
Configuration ->
    . broker -> amqp://user:**@defectdojo-rabbitmq:5672/dojo.celerydb.sqlite
    . loader -> celery.loaders.app.AppLoader
    . scheduler -> celery.beat.PersistentScheduler
    . db -> /var/run/defectdojo/celerybeat-schedule
    . logfile -> [stderr]@%WARNING
    . maxinterval -> 5.00 minutes (300s)
Traceback (most recent call last):
  File "/usr/local/bin/celery", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/celery/__main__.py", line 15, in main
    sys.exit(_main())
  File "/usr/local/lib/python3.8/site-packages/celery/bin/celery.py", line 213, in main
    return celery(auto_envvar_prefix="CELERY")
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))

....

  File "/usr/local/lib/python3.8/site-packages/amqp/method_framing.py", line 53, in on_frame
    callback(channel, method_sig, buf, None)
  File "/usr/local/lib/python3.8/site-packages/amqp/connection.py", line 534, in on_inbound_method
    return self.channels[channel_id].dispatch_method(
  File "/usr/local/lib/python3.8/site-packages/amqp/abstract_channel.py", line 143, in dispatch_method
    listener(*args)
  File "/usr/local/lib/python3.8/site-packages/amqp/connection.py", line 664, in _on_close
    raise error_for_code(reply_code, reply_text,
amqp.exceptions.NotAllowed: Connection.open: (530) NOT_ALLOWED - vhost dojo.celerydb.sqlite not found

celery-worker:

[10/Mar/2022 13:07:40] ERROR [celery.worker.consumer.consumer:344] consumer: Cannot connect to amqp://user:**@defectdojo-rabbitmq:5672/dojo.celerydb.sqlite: Connection.open: (530) NOT_ALLOWED - vhost dojo.celerydb.sqlite not found.
Trying to reconnect...

rabbit-mq:

2022-03-10 13:19:32.720145+00:00 [info] <0.1636.103> accepting AMQP connection <0.1636.103> (10.x.x.51:33852 -> 10.x.x.246:5672)
2022-03-10 13:19:32.721250+00:00 [erro] <0.1636.103> Error on AMQP connection <0.1636.103> (10.x.x.51:33852 -> 10.x.x.246:5672, user: 'user', state: opening):
2022-03-10 13:19:32.721250+00:00 [erro] <0.1636.103> vhost dojo.celerydb.sqlite not found
2022-03-10 13:19:32.721564+00:00 [info] <0.1636.103> closing AMQP connection <0.1636.103> (10.x.x.51:33852 -> 10.x.x.246:5672, vhost: 'none', user: 'user')

valentijnscholten commented 2 years ago

I wonder if we also should pin the container tags in the docker-compose.yml file as this could lead to the same issue?

mtesauro commented 2 years ago

FWIW, I always pin to a specific version (aka Docker tag) for all containers, docker-compose, docker or k8s - I like to know exactly what I'm running. Certainly for any 'PROD" work.

The "latest" is very handy is some situations but it can have effects like described here that aren't that great.

I wonder if this is something that could be addressed in the docs - a heads up that prod best practice is to pin to the version you want to run. I think how we have compose in the repo 'just works' for dev work so I don't see a reason to change that.

valentijnscholten commented 2 years ago

For docker compose in dev I think if we pin to for example 2.9.0-dev it will "just work" the same as it is now?

Maffooch commented 2 years ago

I think matts suggestion to address this in the docs would be best. We already state that the docker-compose.yml file supplied is not intended for production out of the box and should be modified before deploy. The same should also apply to the helm charts.

dsever commented 2 years ago

From official helm doc

IMAGES

'A container image should use a fixed tag or the SHA of the image. It should not use the tags latest, head, canary, or other tags that are designed to be “floating”.'

oliversommer commented 2 years ago

added section "Mitigations / Workarounds"

DefectDojo / django-DefectDojo

helm deployments: in some cases not referencing correct docker image version #6014