Netflix / lemur

Repository for the Lemur Certificate Manager
Apache License 2.0
1.72k stars 324 forks source link

Celery support not documented well-enough to setup #3249

Closed johnkeates closed 3 years ago

johnkeates commented 3 years ago

We're trying to setup Celery to process the recurring tasks, but it's a stumble to get anywhere. For example, without REDIS_HOST set it doesn't even try to start, but instead you'll get an AttributeError indicating lemur.common.celery doesn't exists (technically it of course says celery isn't an attribute of lemur.common).

This is of course something you can find in the stack traces (Which error out a lot when you point the logs to /dev/stdout instead of a seek-able file), and then looking at the redis helper code you'll find that it reads REDIS_HOST which defaults to localhost.

So far we're up to the point where the celery startup gets close but again throws a red herring:

[2020-11-12 22:59:22,687: CRITICAL/MainProcess] Unrecoverable error: ModuleNotFoundError("No module named 'redis-lemur-node-001'")
Traceback (most recent call last):
  File "/app/venv/lib/python3.7/site-packages/kombu/utils/objects.py", line 42, in __get__
    return obj.__dict__[self.__name__]
KeyError: 'backend'

Something is obviously wrong, it's using the first part of the domain AWS generated for an ElastiCache Redis instance, and tries to load it as a module somewhere down the chain. Unlikely to be kombu doing that by itself, so I'm assuming somewhere in celery (in lemur.common), or the flask app configuration...

Anyhow, not trying to dump on lemur, but for such a great idea (and implementation to actually make it happen) it's odd to see so few posts, documentation, shared experiences in setup, especially when it's not all that trivial when you want to do more than run it in a VM and kick the tires a little.

When we get it running we'll happily contribute docs, maybe put it on the company tech blog or something to make it more visible to others looking to do the same, but for now, any help would be appreciated.

Currently, we run it in a container (python:3.7) with added packages to support all the make targets. Lemur does run (when we don't enable Celery), albeit with errors about some of the entrust and digicert plugins not having the required configuration (which is fine, we don't use those).

The config used for lemur to enable celery is a combination of the generated example CELERYBEAT_SCHEDULE dict and:

REDIS_HOST = 'redis-lemur-node-001.asfd.0001.euw1.cache.amazonaws.com'

CELERY_RESULT_BACKEND = 'redis://redis-lemur-node-001.asfd.0001.euw1.cache.amazonaws.com:6379'
CELERY_BROKER_URL = 'redis://redis-lemur-node-001.asfd.0001.euw1.cache.amazonaws.com:6379'
CELERY_IMPORTS = ('lemur.common.celery')
CELERY_TIMEZONE = 'UTC'

At the top of lemur.conf.py we also imported crontab: from celery.task.schedules import crontab

Without it, Lemur and Celery are unhappy because the crontab references in the config cannot be resolved to a module.

johnkeates commented 3 years ago

Replying to myself: it's working! Turns out we do need that REDIS_HOST but we don't need env vars for Celery, only the configuration file ones. Still gives a very large number of stack traces, but at least the tasks are executing.

johnkeates commented 3 years ago

Looking at the master branch, the docs do seem to reflect most of that; I suppose I should have started with the source docs and not the release docs that are rather outdated by now.

hosseinsh commented 3 years ago

Thanks @johnkeates raising this point. I agree with you, we should do a better job in updating the docs, than only relying on source code.

johnkeates commented 3 years ago

Looking at the docs, I'm trying to figure out how to help out. But some of it seems to be related to the release functionality which (I'm assuming) updates readthedocs as well.

I'm currently working on getting logging sorted a bit better; the logger is likely to support plenty of destinations other than local files and I'll write docs for that when it working. At the same time I spotted an easy to make mistake where environment variables we set for containers weren't prefixed (we read them back in lemur.conf.py) and we used the same variable names as Celery likes to read, which overwrites what we set in in the config file.

So far it's been an interesting journey, but I'm happy to see (after reading the code) how easy it is to actually extend all of this for different use cases (i.e. updating Palo Alto Panorama and F5 TMUI automatically) where the targets are rather "classical" devices.

jtschladen commented 3 years ago

@johnkeates we've actually discovered today that doc publishing is currently not working. I'm looking into it now - the docs are intended to be published on every commit, not just releases.

johnkeates commented 3 years ago

@jtschladen I suppose that would explain the old-ish commit hash on the current/latest publication -- 746e314a

Perhaps a semver and release bot would help out, not sure if that is something you guys enjoy but I've seen it help out in projects like saltstack-formulas to make releases (and their changelogs) available in automatic and reasonably sized chunks.