boto / botocore

The low-level, core functionality of boto3 and the AWS CLI.
Apache License 2.0
1.51k stars 1.09k forks source link

start_live_tail raising an error when executed in a Lambda - Header not supported – X-Amzn-Trace-Id #3168

Closed adrianicv closed 6 months ago

adrianicv commented 6 months ago

Describe the bug

I am trying to execute start_live_tail (CloudWatch Logs) within an AWS Lambda, and I'm getting the following error:

An error occurred (MalformedHttpRequestException) when calling the StartLiveTail operation: Header not supported – X-Amzn-Trace-Id
Traceback (most recent call last):
File "/var/task/pyverless/api_gateway_handler/api_gateway_handler.py", line 162, in execute_lambda_code
response_body = self.perform_action()
File "/var/task/src/endpoints/handlers.py", line 34, in perform_action
cloudwatch.start_live_tail(api_manager=ws_api)
File "/var/task/src/endpoints/handlers.py", line 67, in start_live_tail\n    response = self.client.start_live_tail(
File "/var/task/botocore/client.py", line 565, in _api_call
return self._make_api_call(operation_name, kwargs) 
File "/var/task/botocore/client.py", line 1021, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (MalformedHttpRequestException) when calling the StartLiveTail operation: Header not supported – X-Amzn-Trace-Id

I have tried executing the same code with the same botocore and boto3 version in local and it works without raising this error.

The lambda has the following permissions:

        - logs:FilterLogEvents
        - logs:StartQuery
        - logs:GetQueryResults
        - logs:StartLiveTail
        - logs:StopLiveTail

Expected Behavior

The expected behavior is not receiving this error when trying to LiveTail logs within a Lambda.

Current Behavior

When trying to use live tailing in the lambda the error described is raised.

Reproduction Steps

The script is:

class CloudWatchLogService:
    def __init__(self, service: str):
        self.client = boto3.client("logs", region_name=settings.REGION)
        self.group_name = f"/aws/sagemaker/Endpoints/{service}"
        self.group_name_inference = (
            f"/aws/lambda/sample-log-inference"
        )
        self.group_name_arn = (
            f"arn:aws:logs:{settings.REGION}:{settings.ACCOUNT_ID}:log-group:{self.group_name}"
        )
        self.group_name_inference_arn = f"arn:aws:logs:{settings.REGION}:{settings.ACCOUNT_ID}:log-group:{self.group_name_inference}"

    def start_live_tail(self, api_manager: APIGatewayManagementAPIService):
        log_groups = [self.group_name_arn, self.group_name_inference_arn]
        response = self.client.start_live_tail(
            logGroupIdentifiers=log_groups,
        )
        event_stream = response["responseStream"]
        for event in event_stream:
            if "sessionStart" in event:
                session_start_event = event["sessionStart"]
                print(session_start_event)

            elif "sessionUpdate" in event:
                log_events = event["sessionUpdate"]["sessionResults"]
                for log_event in log_events:
                    print(
                        "[{date}] {log}".format(
                            date=datetime.fromtimestamp(log_event["timestamp"] / 1000),
                            log=log_event["message"],
                        )
                    )
                    api_manager.send_to_connection(
                        data={
                            "message": log_event["message"],
                            "timestamp": log_event["timestamp"],
                        }
                    )
            else:
                raise RuntimeError(str(event))

The code has been extracted and adapted from https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/example_cloudwatch-logs_StartLiveTail_section.html

Possible Solution

No response

Additional Information/Context

Relevant requirement versions:

boto3 = "==1.34.90" botocore = "==1.34.90"

SDK version used

1.34.90

Environment details (OS name and version, etc.)

Lambda Runtime: Python3.11

adrianicv commented 6 months ago

I have added some traces in client.py in _make_api_call and I have compared the request in local and in the lambda and, in effect, the lambda request includes the X-Amzn-Trace-Id header while the local one does not have it. Local request_dict["headers"]:

{
    "X-Amz-Target": "Logs_20140328.StartLiveTail",
    "Content-Type": "application/x-amz-json-1.1",
    "User-Agent": "Boto3/1.34.90 md/Botocore#1.34.90 ua/2.0 os/linux#6.5.0-28-generic md/arch#x86_64 lang/python#3.11.9 md/pyimpl#CPython cfg/retry-mode#legacy Botocore/1.34.90"
}

Lambda:

{
    "X-Amz-Target": "Logs_20140328.StartLiveTail",
    "Content-Type": "application/x-amz-json-1.1",
    "User-Agent": "Boto3/1.34.90 md/Botocore#1.34.90 ua/2.0 os/linux#5.10.210-220.855.amzn2.x86_64 md/arch#x86_64 lang/python#3.11.9 md/pyimpl#CPython exec-env/AWS_Lambda_python3.11 cfg/retry-mode#legacy Botocore/1.34.90",
    "X-Amzn-Trace-Id": "Root=1-662a3481-efba1490ff83e13eba5c4a88;Parent=30359a3e11333b80;Sampled=0;Lineage=bcaff313:0"
}
adrianicv commented 6 months ago

I have detected when the header is added and it's in here:

def add_recursion_detection_header(params, **kwargs):
    has_lambda_name = 'AWS_LAMBDA_FUNCTION_NAME' in os.environ
    trace_id = os.environ.get('_X_AMZN_TRACE_ID')
    if has_lambda_name and trace_id:
        headers = params['headers']
        if 'X-Amzn-Trace-Id' not in headers:
            headers['X-Amzn-Trace-Id'] = quote(trace_id, safe='-=;:+&[]{}"\',')

And in the handlers.py we have: ('before-call', add_recursion_detection_header),

So the header is being included when we call any service from a Lambda, so, as far as I understand, this is a bug related with the service itself that it should accept the X-Amzn-Trace-Id header. Am I correct?

adrianicv commented 6 months ago

Finally I get it working with a workaround, but it is not very elegant, I have set the _X_AMZN_TRACE_ID environment variable to "" in my code before calling the start_live_tail operation.

        os.environ["_X_AMZN_TRACE_ID"] = ""

I still think that this should be fixed in the service.

tim-finnigan commented 6 months ago

Hi @adrianicv thanks for reporting this issue. I agree that this seems like an issue with the service, specifically the CloudWatchLogs StartLiveTail API. Since APIs like this are used across AWS SDKs, I created a tracking issue for this in our cross-SDK repository: https://github.com/aws/aws-sdk/issues/731. I'll reach out to the CloudWatchLogs team and try to find out more information regarding supporting that header. Please refer to that issue I created for updates going forward and thanks again.

github-actions[bot] commented 6 months ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.