aws / amazon-mwaa-docker-images

Apache License 2.0
24 stars 11 forks source link

Consider using the 'identifier' parameter in TaskLogHandler #57

Open rafidka opened 4 months ago

rafidka commented 4 months ago

Overview

Currently, we are not making use of the identifier parameter in our TaskLogHandler:

def set_context(self, ti: TaskInstance, *, identifier: str | None = None) -> None:
    """
    Provide context to the logger.

    This method is called by Airflow to provide the necessary context to configure
    the handler. In this case, Airflow is passing us the task instance the logs
    are for.

    :param ti: The task instance generating the logs.
    :param identifier: Airflow uses this when relaying exceptional messages to task
    logs from a context other than task itself. We ignore this parameter in this
    handler, i.e. those exceptional messages will go to the same log stream.
    """

    logs_client: CloudWatchLogsClient = boto3.client("logs")  # type: ignore

    if self.enabled:
        # identical to open-source implementation, except create_log_group set to False
        self.handler = watchtower.CloudWatchLogHandler(
            log_group_name=self.log_group_name,
            log_stream_name=self._render_filename(ti, ti.try_number),  # type: ignore
            boto3_client=logs_client,
            use_queues=True,
            create_log_group=False,
        )

        if self.formatter:
            self.handler.setFormatter(self.formatter)
    else:
        self.handler = None

We can potentially use it to log those exceptional messages into a separate task log, making them easier to discover. Searching Airflow code base, I identified the following logs:

It is probably useful to have those messages in a separate log stream so they can be easily detected, but I would like us to a do some investigation on this before making a decision.

Acceptance Criteria

Additional Info

N/A