NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.09k stars 615 forks source link

Keep separate per-pipeline operator counters. Error out when "stealing" subgraphs from other pipelines results in duplicate names. #5506

Closed mzient closed 3 months ago

mzient commented 3 months ago

Category:

New feature (non-breaking change which adds functionality) Refactoring (Redesign of existing code that doesn't affect functionality)

Description:

Prior to this change constructing exactly the same pipeline (e.g. by calling a function decorated with @pipeline_def) multiple times produced pipelines with different operator instance names and differently named operator instances and DataNodes. This PR changes that so that pipelines with the same structure have the same node names. This is achieved by:

  1. Using a separate operator counter in each pipeline. When all nodes all instantiated within pipeline scope, no further action is required. This happens for the vast majority of cases.
  2. If a name collision occurs (only possible when "stealing" a subgraph from another pipeline), an error is raised.

Pipelines that are defined without a "current" pipeline have distinct operator instance names.

Additionally, there were some problems with operator discovery. I rewrote it to a much simpler DFS.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

pipeline_test.py: test_dangling_subgraph

Checklist

Documentation

DALI team only

Requirements

REQ IDs: N/A

JIRA TASK: N/A

dali-automaton commented 3 months ago

CI MESSAGE: [15624008]: BUILD STARTED

dali-automaton commented 3 months ago

CI MESSAGE: [15624354]: BUILD STARTED

dali-automaton commented 3 months ago

CI MESSAGE: [15624997]: BUILD STARTED

dali-automaton commented 3 months ago

CI MESSAGE: [15624997]: BUILD FAILED

dali-automaton commented 3 months ago

CI MESSAGE: [15633718]: BUILD STARTED

dali-automaton commented 3 months ago

CI MESSAGE: [15633718]: BUILD FAILED

dali-automaton commented 3 months ago

CI MESSAGE: [15654084]: BUILD STARTED

dali-automaton commented 3 months ago

CI MESSAGE: [15654084]: BUILD FAILED

dali-automaton commented 3 months ago

CI MESSAGE: [15654586]: BUILD STARTED

dali-automaton commented 3 months ago

CI MESSAGE: [15709123]: BUILD FAILED

dali-automaton commented 3 months ago

CI MESSAGE: [15734267]: BUILD STARTED

dali-automaton commented 3 months ago

CI MESSAGE: [15734267]: BUILD PASSED