Netflix / metaflow

:rocket: Build and manage real-life ML, AI, and data science projects with ease!
https://metaflow.org
Apache License 2.0
7.79k stars 737 forks source link

Argo Events: trigger sensors are not deleted #1870

Closed gabriel-rp closed 2 weeks ago

gabriel-rp commented 1 month ago

Description

I'm using Metaflow with Argo Workflows. When I run argo-workflows delete, the Workflow Template gets deleted, but none of the sensors do, they're left behind consuming resources. I've found this to be the case only when using the @project decorator.

Root cause

When Metaflow registers the sensor, it replaces dots with hyphens in the workflow name.

...
            # Register sensor. Unfortunately, Argo Events Sensor names don't allow for
            # dots (sensors run into an error) which rules out self.name :(
            # Metaflow will overwrite any existing sensor.
            sensor_name = self.name.replace(".", "-")
...

However, this is not done when it's time to delete them.

...
sensor_deleted = client.delete_sensor(name)
...

In this context, name is the workflow name.

I have confirmed this behavior by checking the logs of my K8s cluster. There is an attempt to delete a sensor named mytestproject.user.g.pereira.3.mytestflow, but the real sensor is called mytestproject-user-g-pereira-3-mytestflow.

Here's the flow used for that example:

from metaflow import FlowSpec, step, project, trigger

@project(name="my_test_project")
@trigger(event="test_airflow_trigger")
class MyTestFlow(FlowSpec):
    @step
    def start(self):
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == "__main__":
    MyTestFlow()

Furthermore, this explains why the error only occurs when the @project decorator is used, because otherwise there is no . in the name, therefore the workflow and sensor names are the same.