Logic based dynamic graph computation of DALI Pipelines

NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html

Apache License 2.0

5.06k stars 615 forks source link

Logic based dynamic graph computation of DALI Pipelines #4590

Closed keshavvinayak01 closed 8 months ago

keshavvinayak01 commented 1 year ago

Hello, I have a Pipeline object for which I write the define_graph function. Now, the computations I need to perform on the input data depends also on the data object itself, for example I can choose to apply the color_space_conversion based on the metadata of the image. The code looks something like this:

class AutoencoderPipe(Pipeline):
    def __init__(
        self, num_threads, device_id, samples):
        super(AutoencoderPipe, self).__init__(
            batch_size=1,
            num_threads=num_threads,
            device_id=device_id,
            seed=32,
            prefetch_queue_depth=2,
            py_start_method='spawn',
            py_num_workers=0,
        )
        # Add operations here that you would use in define_graph.
        self.input = ExternalInputCallable(32, 1, samples)

    def define_graph(self):

        image, label, normalized = fn.external_source(
            source=self.input,num_outputs=3,batch=False,parallel=True,dtype=[types.FLOAT, types.INT32, types.INT32])

        image = fn.cast(image, dtype=types.UINT8)
        image = fn.decoders.image(image, device="cpu")

        if not normalized:
            image = fn.cast(image, dtype=types.UINT8)
            image = fn.color_space_conversion(image, image_type=types.RGB, output_type=types.YCbCr)

        return [image, label]

Note that the batch_size=1, so I'll only obtain a single image along with a singular value for the normalized variable. Now, I can see that this variable is actually a DataNode object, so I won't be able to access its value to apply a conditional check.

My question is how would I actually be able to get the value associated with this object?

Also, If i were to extend this to a higher batch size, can I get this normalized variable as an array of values upon which I can apply indexed checks?

klecki commented 1 year ago

Hi @keshavvinayak01, we are currently working on the exact functionality that you need. Base support for if statements was merged yesterday: https://github.com/NVIDIA/DALI/pull/4561 (currently as an experimental feature).

There is a PR with documentation in review: https://github.com/NVIDIA/DALI/pull/4589 - here is a direct link to tutorial with more detailed explanation: https://github.com/klecki/DALI/blob/cond-intro-tutorial/docs/examples/general/conditionals.ipynb

Those changes allow DALI to detect if statements that contain DataNode as a condition, and trace the source code of both branches, so they can be represented in the execution graph and the pipeline can run them conditionally for every sample.

It works automatically for the batch_size > 1, you can still think about the code as if it was running one sample at a time (and doing it for a whole batch in parallel).

To enable it in your code, you would need to switch from using Pipeline class, to the pipeline decorator, specifically the experimental one, and set enable_conditionals=True. Here is the adjusted code:

from nvidia.dali.pipeline import experimental

@experimental.pipeline_def(enable_conditionals=True, batch_size=16, num_threads=num_threads,
                           device_id=device_id, seed=32, prefetch_queue_depth=2,
                           py_start_method='spawn', py_num_workers=0)
def autoencoder_pipeline(samples):

    image, label, normalized = fn.external_source(
        source=ExternalInputCallable(32, 1, samples),
        num_outputs=3,
        batch=False,
        parallel=True,
        dtype=[types.FLOAT, types.INT32, types.INT32])

    image = fn.cast(image, dtype=types.UINT8)
    image = fn.decoders.image(image, device="cpu")

    if normalized == 0:
        image = fn.color_space_conversion(image, image_type=types.RGB, output_type=types.YCbCr)

    return image, label

pipe = autoencoder_pipeline(samples)
pipe.build()
pipe.run()

There are some limitations though, for example we don't support logical expressions (not, and, or) in DALI yet - you need to replace them with mathematical operations: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/math.html

Here are the details explaining the usage of pipeline decorator: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/pipeline.html#pipeline-decorator

Let me know if you have any further questions.

klecki commented 1 year ago

I forgot to add: this feature will be a part of next release (DALI 1.23), and should be available in the nightly builds next week - we are currently facing some issues with CI, I will let you know when it's there. Here are the installation instructions for nightly builds: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html#pip-nightly-and-weekly-releases

klecki commented 1 year ago

My question is how would I actually be able to get the value associated with this object? Also, If i were to extend this to a higher batch size, can I get this normalized variable as an array of values upon which I can apply indexed checks?

In the pipeline definition, DALI operators return DataNode which are just symbolic representation of nodes in the compute graph. The graph is built in the backend (a bit similarly to how TensorFlow operates) and executed. The Python code is no longer used at that point, it's only used to define how the data flows between operators in the backend.

klecki commented 1 year ago

Hi @keshavvinayak01, the nightly build containing the initial support for conditional operations in DALI was published, you can install it following the instructions I posted earlier if you want to test it.

keshavvinayak01 commented 1 year ago

Hey @klecki ! Would it be possible to add this support for the Pipeline class? (within the define_graph function) I prefer writing classes over the functional paradigm.

klecki commented 1 year ago

Hi @keshavvinayak01, could you tell me a bit more what and why do you prefer the class approach as opposed to the functional approach? What is easier for you or better there in you opinion, so maybe we can work on improving our APIs?

There are some limitations with the class approach, mainly there may be problem with detecting variables that are propagated through class state as member variables, that's why we first put this feature in the functional API - the functional paradigm makes it easier to trace the code.

We will take a look into supporting conditionals with the define_graph but we have some other features planned for the conditional execution first so it won't make it into 1.23.

keshavvinayak01 commented 1 year ago

@klecki It's more of a personal choice than a performance/feature limitation. Classes encapsulate data specific to an object of that class, and I like the flexibility this offers when I want to initialise several objects with varying parameters. The way I organise my project, I like to couple related functions within the same class, it seems more intuitive (IMO) to think of class member functions as properties of the object itself rather than defining a separate function.

All this being said, it doesn't seem like a critical feature unless more people offer better reasoning :)

klecki commented 1 year ago

@keshavvinayak01 I see, thanks for the explanation. We will try to handle the class approach as one of the next steps. I will let you know when we have any updates.

keshavvinayak01 commented 1 year ago

@klecki So I tried using the experimental pipeline with conditional execution on the nightly build, and there are a few constraints/design decisions which are bugging me.

Apparently, at the join of an if condition, all the data points need to have the same data type and the same dimensionality. And this is true even across batches. So within the same batch, If I only wanted to process half of the data-items based on a condition, they need to come out of the conditional with the same datatype. This is quite constraining as I have to artificially augment the data (fn.cast) to make it work. And this seems to hold when I pull in different batches as well. For example, I have three conditionals in a sequence, and batch-0 will pass the check for all three conditionals but batch-1 will skip the first one. Even in this case, the datatype + the dimensionality of the batch needs to be the same at all conditional join points.
Also, the data providing function/class used in an external_operator also needs to spit out the same dimensionality data across batches. My use-case is a bit different, but if this is a fixable/relaxable feature, let me know.

If the above are some optimizations, kindly help me understand the rationale. If it's a bug or fix-able, please point me to the relevant files which I can modify to bypass these constraints.

Thanks!

klecki commented 1 year ago

@keshavvinayak01, Reading your description, I think some of the problems you have can probably be worked around by adjusting the branching and possibly introducing empty samples to pad the batches. Can you share a bit more about your processing? Even pseudocode would help. Also, could you explain how do you envision processing with the type/dimensionality of input data changing? I would like to know more about it, so we can consider such approach in our development or maybe suggest alternative approach.

The point 1. is indeed a constraint introduced by DALI approach to processing. The main unit of processing in DALI is a batch - to process a batch efficiently, DALI assumes that all samples within batch have the same type and dimensionality. The thing that is allowed to vary is the shape of individual sample. Variable sample shape is a generalization of how deep learning frameworks typically operate, where (within the layers of the network) the batch is just an additional dimension in the processed tensor thus enforcing uniform samples within batch.

To introduce non-uniform batches (for type or dimensionality), we would need to working against the design decisions of the main DALI data type: TensorList which represents a batch. That would probably be a very big undertaking and I still think for most cases the conditionals should be enough.

As for point 2, the initial DALI usage didn't anticipate the change of types or dimensionality between iterations when it was used to data processing for training. Some operator implementation do additional setup on their first run, and thanks to that they do not need to repeat that step and do a reconfiguration. This is mostly for simplification of the implementation and an optimization to some degree. I expect it could be generalized, but it is rather not easy. Most of the operators assume that the inputs have fixed type and dimensionality so we would need to reinitialize them for every iteration on executor level or adjust their implementation. Enabling this just on the External Source operator level would just cause DALI to break in other places.

keshavvinayak01 commented 1 year ago

@klecki I understand, thanks!

klecki commented 8 months ago

Closing this issue, DALI won't provide support for ops API/define_graph approach for conditional mode. This approach is both not compatible with the underlying implementation (the fn API is required, with functional style of writing code is recommended: functions should accept parameters and return values rather than share them via state) and ops API is considered a legacy/maintenance feature.