aws-samples / graceful-shutdown-with-aws-lambda

MIT No Attribution
74 stars 13 forks source link

AWS Lambda graceful shutdown mechanism only works for Python 3.7, not 3.8 or 3.9 #2

Closed handlerbot closed 9 months ago

handlerbot commented 1 year ago

Hi! I implemented this pattern as described in a Lambda function using the Python 3.9 runtime on the ARM architecture in us-west-2, and while I'm getting the EXTENSION Name: cloudwatch_lambda_agent State: Ready Events: [SHUTDOWN,INVOKE] logs, my shutdown handler is never being run.

Any suggestions for debugging I can do? Does anyone have a live version of this working for Python (any version, any architecture, any region) that I could try reproducing by matching the successfully working characteristics?

Thanks for any help anyone can provide!

bnusunny commented 1 year ago

Python3.8 and Python3.9 runtimes have this issue. Python3.7 is not affected. I'll forward your feedback to Lambda service team.

handlerbot commented 1 year ago

Boo for that, but good to know, and thank you for the quick response and sending the feedback in!

I have updated the issue title to help others who come in and look, please let me know if you want it changed to something else, or feel free to change it yourself if you have the permissions.

andidev commented 1 year ago

really? come on! @bnusunny can you please add it very clearly to the documentation.

bnusunny commented 1 year ago

Yes. I added a note in the README. The Lambda team is aware of this issue and has a plan to improve this.

andidev commented 1 year ago

@bnusunny and what about new python 3.10? there support added for it?

bnusunny commented 1 year ago

Not yet. I have one workaround if you are building an API: use Lambda Web Adapter to run your Python API on Lambda. It receives SIGTERM with that tool. Here are examples for running Flask and FastAPI with LWA.

jodylent commented 1 year ago

aws-lambda-python-runtime-interface-client#105 has an update with details which may be relevant here.

bnusunny commented 1 year ago

Great!

bnusunny commented 9 months ago

Good news! This is fixed in Python 3.12 runtime.

https://aws.amazon.com/blogs/compute/python-3-12-runtime-now-available-in-aws-lambda/

jodylent commented 9 months ago

Thanks for the update -- VERY glad to see progress here!

Is there any plan to backport the fix to prior Python versions?

This remains an issue for anyone using 3.8 <= x <= 3.11, which is going to be the vast majority of organizations, as 3.12 isn't even 3 months into GA

eschwartz commented 9 months ago

Please make a note of this issue on the documentation page. I lost many hours trying to get the examples here to work, before finding this (closed) GitHub issue. I'd say this warrants a large warning message on top of the python readme, considering that the examples in this repo will not currently work for many users.

I made a PR with the documentation change: https://github.com/aws-samples/graceful-shutdown-with-aws-lambda/pull/9

jodylent commented 9 months ago

Additional info:

Test Handler (collapsed) ```py # Set Function handler -> $this_file.my_func import atexit import signal def _atexit_hook(message): """Register twice, with different messages, to distinguish Lambda INIT/Handler behavior""" print("==== ATEXIT HOOK ====") print(f"Hook received message: {message}") print("==== ATEXIT HOOK ====") def _signal_hook_1(sig, frame): """Signal module accepts a single handler per-signal""" print("==== SIGNAL HOOK (1) ====") print(f"Caught SIGTERM with sig={sig}, frame={frame}") print("==== SIGNAL HOOK (1) ====") def _signal_hook_2(sig, frame): """This func provides a way to test OVERWRITE behavior due to re-registration for a given SIGFOO""" print("==== SIGNAL HOOK (2) ====") print(f"Caught SIGTERM with sig={sig}, frame={frame}") print("==== SIGNAL HOOK (2) ====") hook_msg = "INIT hook" msg = f"Register atexit hook: {_atexit_hook} in INIT with message={hook_msg}" print(f"ATTEMPT: {msg}") atexit.register(_atexit_hook, hook_msg) print(f"SUCCESS: {msg}") msg = f"Register signal hook: {_signal_hook_1} in INIT" print(f"ATTEMPT: {msg}") signal.signal(signal.SIGTERM, _signal_hook_1) print(f"SUCCESS: {msg}") def my_func(event_records=None, tools=None, *args, **kwargs): hook_msg = "ENTRYPOINT hook" msg = f"Register atexit hook: {_atexit_hook} in ENTRYPOINT with message={hook_msg}" print(f"ATTEMPT: {msg}") atexit.register(_atexit_hook, hook_msg) print(f"SUCCESS: {msg}") msg = f"Register signal hook: {_signal_hook_2} in ENTRYPOINT" print(f"ATTEMPT: {msg}") signal.signal(signal.SIGTERM, _signal_hook_2) print(f"SUCCESS: {msg}") ```
Test Handler (collapsed) ```py INIT_START Runtime Version: python:3.12.v16 Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:c9875014cbcc77e3455765804516f064d18fe7b27ae7bdb2b1d84ab01ba784f5 ATTEMPT: Register atexit hook: in INIT with message=INIT hook SUCCESS: Register atexit hook: in INIT with message=INIT hook ATTEMPT: Register signal hook: in INIT SUCCESS: Register signal hook: in INIT # Any extension will do -- I used DD_EXTENSION because it logs at INFO during both INIT and SHUTDOWN lifecycle stages 2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | 0 Features detected from environment: 2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Retry queue storage on disk is disabled 2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Creating TimeSampler #0 2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | dogstatsd-udp: starting to listen on 127.0.0.1:8125 2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | 0 Features detected from environment: 2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Loaded configuration: /var/task/datadog.yaml TELEMETRY Name: datadog-agent State: Subscribed Types: [Platform, Function, Extension] 2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Starting logs-agent... 2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | logs-agent started EXTENSION Name: datadog-agent State: Ready Events: [SHUTDOWN, INVOKE] # BEGIN ACTUAL INVOCATION START RequestId: f72ef0ec-2e57-4fc3-aa2e-06553484c757 Version: $LATEST ATTEMPT: Register atexit hook: in ENTRYPOINT with message=ENTRYPOINT hook SUCCESS: Register atexit hook: in ENTRYPOINT with message=ENTRYPOINT hook ATTEMPT: Register signal hook: in ENTRYPOINT SUCCESS: Register signal hook: in ENTRYPOINT 2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Triggering a flush in the logs-agent 2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Successfully posted payload to \"https://7-46-0-app.agent.datadoghq.com/api/v1/series\", the agent will only log transaction success every 500 transactions END RequestId: f72ef0ec-2e57-4fc3-aa2e-06553484c757 REPORT RequestId: f72ef0ec-2e57-4fc3-aa2e-06553484c757 Duration: 42.64 ms Billed Duration: 43 ms Memory Size: 1792 MB Max Memory Used: 88 MB Init Duration: 636.30 ms # END ACTUAL INVOCATION 2023-12-15 16:17:16 UTC | DD_EXTENSION | WARN | Could not send payload: Post "https://http-intake.logs.datadoghq.com/api/v2/logs": context deadline exceeded (Client.Timeout exceeded while awaiting headers) ==== SIGNAL HOOK (2) ==== Caught SIGTERM with sig=15, frame= ==== SIGNAL HOOK (2) ==== 2023-12-15 16:17:18 UTC | DD_EXTENSION | INFO | Triggering a flush in the logs-agent 2023-12-15 16:17:18 UTC | DD_EXTENSION | INFO | Stopping logs-agent 2023-12-15 16:17:18 UTC | DD_EXTENSION | INFO | logs-agent stopped ```
bnusunny commented 9 months ago

@jodylent

At the moment, the change in 3.12 will not be back-ported to 3.8 to 3.11. People need to use Python 3.12 to get this feature.

Python atexit module does not trigger on SIGTERM. But you can leverage a signal handler to trigger them. Here is an example.


def sigterm_handler(signum, frame):
    print("Received SIGTERM signal")
    atexit._run_exitfuncs()

# Register the signal handler for SIGTERM
signal.signal(signal.SIGTERM, sigterm_handler)
jodylent commented 9 months ago

At the moment, the change in 3.12 will not be back-ported to 3.8 to 3.11

... [one] can leverage signal handler to trigger atexit.

Thank you for the updates -- I'll advertise them internally.