DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.88k stars 126 forks source link

Stop python process from exiting too early with async tracker still having data to send #1214

Open skrawcz opened 3 weeks ago

skrawcz commented 3 weeks ago

Current behavior

If you're in aws lamdba, the lambda can kill the event loop before we manage to send all events. This means that things complete, but we're missing data for a few nodes/functions.

Steps to replicate behavior

This is hard to recreate locally unless you kill the process immediately after the driver finished executing.

  1. run async driver logging to tracker
  2. immediate kill process

Library & System Information

latest

Expected behavior

The tracker should be able to register something with python to say "hey run this before the process closes" to ensure the events get sent -- we do this with registering stop() with the sync client.

Additional context

Asyncio and atexit don't seem to play well together. There seems to be a library that could help. But yeah could be easy, or could be digging into the internals of python to figure this out.