MetaCell / cloud-harness

Other
14 stars 5 forks source link

Listener to services should be on a separate thread and resilient by default #682

Open filippomc opened 1 year ago

filippomc commented 1 year ago

Generated applications for django have a default behaviour to sync users which requires kafka and accounts to be fully operative at startup. If they're not fully operative, pods go on a crashloop. This is not ideal because the user facing part of the application does not necessarily requires user login functionality.

Ideally we should set this kind of listeners on a separate pod/container.

A quicker proposal is the following:

if os.environ.get('KUBERNETES_SERVICE_HOST', None):
    # init the auth service when running in/for k8s
    import threading 
    from cloudharness_django.services import get_auth_service, init_services
    from cloudharness import log
    import time
    def start_auth_service():
        try:
            init_services()
        except:
            log.exception("Error initializing services. Retrying in 5 seconds...")
            time.sleep(5)
            start_auth_service()

    threading.Thread(target=start_auth_service).start()
    # start the kafka event listener when running in/for k8s
    def start_event_listener():
        try:
            import cloudharness_django.services.events
            log.info("User sync events listener started")
        except:
            log.exception("Error initializing event queue. Retrying in 5 seconds...")
            time.sleep(5)
            start_event_listener()
    threading.Thread(target=start_event_listener).start()
zsinnema commented 1 year ago

@filippomc the listener already is on a separate thread (using async_consume), see https://github.com/MetaCell/cloud-harness/blob/5e5052f3748dc48d954507f339e5b77b1eadd60a/infrastructure/common-images/cloudharness-django/libraries/cloudharness-django/cloudharness_django/services/events.py#L61

filippomc commented 1 year ago

Yes the consume is on a different thread, but the initialization of the client isn't, so making the main application thread file failing by default is kafka is not there. That's not necessarily wrong shouldn't be the default behaviour in my opinion as kafka events are usually related to specific features and are preventing/delaying the application startup this way.