Open obaltian opened 1 month ago
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
Describe the bug It's impossible to delete DataProcessInstance objects in bulk (by filtering by entity-type). It either raises an error or doesn't find anything depending on whether you provide additional filter (e.g.
--plaftform=airflow
).Only delete by
--urn
works, which isn't convenient for managing Datahub content.To Reproduce
Deploy datahub locally:
Ingest sample job & its "start" event:
graph = DataHubGraph(DatahubClientConfig(server="http://localhost:8080"))
flow = DataFlow(env="prod", orchestrator="airflow", id="flow_api_simple") flow.emit(graph) job = DataJob(flow_urn=flow.urn, id="job1", name="My Job 1") job.emit(graph) run = DataProcessInstance.from_datajob(datajob=job, id=f"{flow.id}-1") run.emit(graph)
optionally, DataProcessInstance is created event without start
import time run.emit_process_start(graph, int(time.time() * 1000))
Expected behavior Step 3 from the section above should find and delete relevant DataProcessInstance objects.
Screenshots
Desktop (please complete the following information):
Additional context We tried to find some workaround for this problem by providing additional arguments or using GraphQL directly but got no luck. Here is a thread from Datahub's slack: https://datahubspace.slack.com/archives/C029A3M079U/p1715184338329459