NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.06k stars 615 forks source link

Creating a custom operator #410

Closed addisonklinke closed 5 years ago

addisonklinke commented 5 years ago

In the example here, I see how we can determine the shape of tensors in the tensor list returned from a pipeline.

I am interested in knowing the shape of an image inside the pipeline. The image is decoded onto the GPU using ops.nvJPEGDecoder, and preferably the shape (width, height, channels) would be accessible with something like numpy.ndarray.shape. However, the output from the decoder is of type nvidia.dali.tensor.TensorReference and does not have a shape attribute. Is there a way to determine the shape information from a TensorReference object?

Kh4L commented 5 years ago

Hi Addison,

Could you provide a code example of what you want to do here?

TensorReference is an python object used by the internal Pipeline implementation to symbolically construct the graph. The name can be ambiguous, but it’s just a reference on an instantiate op, used to construct the graph in define_graph.

Hence, it doesn’t contain any runtime info (which the image dimensions are) and is not meant to be access by the user.

addisonklinke commented 5 years ago

Hi Serge,

Thank you for the quick response. Here's one of the documentation examples modified with what I'd like to achieve (see the comment in define_graph)

import nvidia.dali.ops as ops
import nvidia.dali.types as types

image_dir = "images"
batch_size = 8

class SimplePipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(SimplePipeline, self).__init__(batch_size, num_threads, device_id, seed = 12)
        self.input = ops.FileReader(file_root = image_dir)
        self.decode = ops.HostDecoder(output_type = types.RGB)

    def define_graph(self):
        jpegs, labels = self.input()
        images = self.decode(jpegs)
        height, width, channels = images.shape  # Determine image dimensions inside the pipeline
        return (images, labels)

The reason I want to know the image shape is for calculating the location of bounding box corners after ops.Rotation is applied (as described here). I'm using ops.ExternalSource to define a custom iterator, so I could determine the shape there and pass it to the pipeline instead. However, this seems redundant since I already have the image in the pipeline which contains the same information

Does this help clarify?

JanuszL commented 5 years ago

Hi, If I remember correctly you can obtain this kind of information with something like:

def define_graph(self):
        jpegs, labels = self.input()
        images = self.decode(jpegs)
        return (images, labels)

(images, labels) = pipe.run()
for i in range(batch_size):
     print(images.at(i).shape)

As I understand you want to do this processing outside DALI. If you wan to do it inside you need to develop custom operator for BBoxes - something like https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/supported_ops.html#nvidia.dali.ops.BbFlip.

addisonklinke commented 5 years ago

@JanuszL Correct me if I'm wrong, but for my use case I think I have to do the processing inside the DALI pipeline? Since I want to randomly generate the augmentation hyperparameters (i.e. rotation angle) with ops.Uniform, I can only access their values inside the pipeline. Rotating the bounding box requires knowing the angle, so that also has to happen inside the pipeline.

For developing a custom operator, are you suggesting something like bb_rotate.cc that would implement rotation and bounding box transform together in one operation?

JanuszL commented 5 years ago

@addisonklinke - now it is clear; yes you need an operator for that. Regarding bb_rotate - I would suggest keeping image and bbox operation separated, so you can reuse existing rotation for images. BbFlip works like that - it flips only bboxes, while for images we use another operator - please check https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/examples/detection_pipeline.html example. If you don't want to build DALI from scratch you can use new plugin API - https://github.com/NVIDIA/DALI/blob/master/docs/examples/extend/create_a_custom_operator.ipynb. It will be available starting from next release.

addisonklinke commented 5 years ago

@JanuszL The custom operator Jupyter notebook has been very helpful - exactly what I was looking for! I successfully modified Ops.CustomDummy() to add 1 to each element of the input tensor instead of simply copying. However, I am having difficulty adding a second input to the operator.

Looking at examples like box_encoder.cc, it seems like the appropriate pattern to use inside MyOp::RunImpl() is:

  const auto& input_1 = ws->Input<CPUBackend>(0);
  const auto& input_2 = ws->Input<CPUBackend>(1);

I've tried that for my custom operator and also added DALI_SCHEMA(MyOp).NumInput(2), but I cannot access the second input parameter. My code compiles, but at runtime I receive: RuntimeError: [/home/addison/Documents/git/dali/dali/pipeline/pipeline.cc:164] Assert on "!it->second.is_support" failed: Argument input can only be used as regular input by support ops. (op: 'BbRotate', input: 'Uniform_id_1_output_0'). I see the corresponding source code for the error in pipeline.cc, but am having difficulty understanding how it should be solved. Do I need to use additional methods for DALI_SCHEMA such as .AddArg() or .AllowMultipleInputSets()? I have seen those in the source code of some other multi-parameter operators. Please advise or point me to a canonical example for writing an operator with multiple input parameters

Kh4L commented 5 years ago

@addisonklinke Indeed, specifying DALI_SCHEMA(MyOp).NumInput(2) and accessing it as in box_encoder.cc should be enough. How are you calling MyOp in the graph definition? This error implies that support operator Uniform's output has been given at a regular input argument position in the call.

Since you set NumInput to 2, your op call in define_graph have to be:

output = self.myop(input1, input2, ...)

where input1 and input2 have to be the first arguments (and ... representing all the other MyOp Args, such as Uniform output).

addisonklinke commented 5 years ago

@Kh4L In my pipeline, I define a random generator for the rotation angle (with Ops.Uniform) and want to use that as the second input to MyOp (the first input is a tensor of bounding box corners)

class MyPipe(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(MyPipe, self).__init__(batch_size, num_threads, device_id, seed=12)
        self.input = ...
        self.generator = ops.Uniform()
        self.myop = ops.MyOp()

    def define_graph(self):
        corners = self.input()
        angle = self.generator()
        output = self.myop(corners, angle)
        return output

Are you saying that the output from ops.Uniform() should not be counted as one of the two inputs? If so, how do we distinguish input tensors for which that exception applies?

Kh4L commented 5 years ago

That’s right, you have to use Uniform’s output as an argument: DALI’s graph builder and executor will take care of this for you. Then you can either specify your argument angle manually, or passing it a Support op argument’s output.

You can refer to https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/examples/getting%20started.html#Tensors-as-arguments-and-Random-Number-Generation and how angle argument is used.

addisonklinke commented 5 years ago

@Kh4L I was able to successfully add an angle arugment from Ops.Uniform to MyOp.cc using DALI_SCHEMA.AddArg(), however I am unable to access this value within MyOp::RunImpl. I tried using const auto& angle = ws->Input<::dali::CPUBackend>(1), but I get the error that index 1 is out of range.

It seems that the operator inputs can be accessed this way, but an operator argument does not count as one of the indices. What is the appropriate way to access an argument inside the operator?

Kh4L commented 5 years ago

@addisonklinke When you specify angle as an argument by adding it with OpSchema::AddArg(), you should have to access it through the OpSpec, and you cannot access it with Workspace::Input, which is used only for regular (non-named-argument) inputs.

To be able to access this argument in MyOp::RunImpl in, you can access with Operator base member variable spec_, as such:

spec_.GetArgument("angle", ws, data_idx)`.

Please see it in practice here. Using it: https://github.com/NVIDIA/DALI/blob/8ca3ce09c170ef93265ac3b529db7fb21d83863e/dali/pipeline/operators/displacement/rotate.h#L33 The caller class kept a ref on spec_: https://github.com/NVIDIA/DALI/blob/8ca3ce09c170ef93265ac3b529db7fb21d83863e/dali/pipeline/operators/displacement/displacement_filter_impl_cpu.h#L96

NB: all the workspaces (SampleWorkspace, DeviceWorkspace) are derived from ArgumentWorkspace.

addisonklinke commented 5 years ago

@Kh4L @JanuszL thank you for all your help! I've been able to compile my own ops.BbRotate and import into Dali with the plugin manager. One remaining question - does Dali currently support (or plan to support) conditional compilation of the augmentation graph at runtime?

For instance, I would like to check the average pixel intensity of an image, and apply a brightness augmentation only if it's above a certain threshold. Otherwise, if the condition fails, I would skip that augmentation entirely. In other cases, I want to tell the pipeline how many times to repeat a certain augmentation (with different random parameters, of course) on the same image

JanuszL commented 5 years ago

Hi, We haven't considered that yet, but it sounds like interesting idea. We are about to rework some parts of architecture and we will have discussion if and how we could support such use case.

addisonklinke commented 5 years ago

Excellent, I'll be curious to hear what you come up with

Kh4L commented 5 years ago

Glad to hear that 🙂 And indeed, this is an interesting idea worth to explore

addisonklinke commented 5 years ago

One option I've considered to create a conditional augmentation graph is defining my own "meta operator" which conditionally applies existing Dali operators based on characteristics of the input image and/or the values of named operator inputs.

The difficulty I've found with this approach is that the RunImpl() methods of existing Dali operators are protected members. This means I can initialize an instance of dali::Crop inside my operator's RunImpl(), but cannot access the Crop instance's RunImpl() (see example my_operator.cpp below). Is there another way to achieve this? Right now I've just made my operator's class inherit from whatever Dali operator class(es) I'd like to use.

A related question is how to access the output of each Dali operator's RunImpl() so that I can return it from my own operator? Currently, all the implementations have a void return and just modify the tensors directly.

#include "my_operator.h"
#include "crop.h"

namespace dali
{
    template<>
    void MyOperator<CPUBackend>::RunImpl(SampleWorkspace *ws, const int idx)
    {
        Crop<CPUBackend> *dali_crop = new Crop<CPUBackend>(spec_);
        dali_crop->RunImpl(ws, idx);  // compiler error from protected member

        // Instantiate additional Dali operators like above
    }

    DALI_REGISTER_OPERATOR(MyOperator, MyOperator<CPUBackend>, CPU);

    DALI_SCHEMA(MyOperator)
        .DocStr("Meta operator to wrap the existing dali::Crop operator")
        .NumInput(1)
        .NumOutput(1);
} 

If you need it to test compiling, my_operator.h is based on the plugin manager example:

#ifndef MY_OPERATOR_H
#define MY_OPERATOR_H

#include "dali/pipeline/operators/operator.h"

namespace dali
{
    template <typename Backend>
    class MyOperator : public Operator<Backend>
    {
        public:
            inline explicit MyOperator(const OpSpec &spec) :
              Operator<Backend>(spec) {}

            virtual inline ~MyOperator() = default;

            MyOperator(const MyOperator&) = delete;
            MyOperator& operator=(const MyOperator&) = delete;
            MyOperator(MyOperator&&) = delete;
            MyOperator& operator=(MyOperator&&) = delete;

        protected:
            void RunImpl(Workspace<Backend> *ws, const int idx) override;
    };
}

#endif
Kh4L commented 5 years ago

RunImpl virtual function is public in operator.h but most of implems that override it actually change the Access modifier to protected.

Anyway, something you can do here is overriding Operator::Run, which is guaranteed to stay public. You can check reader_op.h implementation, where we are doing that.

addisonklinke commented 5 years ago

@Kh4L That's good to know. I was able to call dali_crop->Run(ws) without getting an error about the protected member.

However, even though the libmyoperator.so file compiles I am unable to load it into Python using plugin_manager.load_library('/path/to/so/file'). There is a runtime error about undefined symbols related to the crop operator (in one case, it is _ZTIN4dali8CropAttrE although I have gotten others as well). If I use nm /path/to/libdali.so | grep unkown_symbol, I see that the unknown symbol(s) are indeed present in the linked object. However if I include the --extern-only flag with the nm command, they are no longer present (which based on this StackOverflow thread seems to indicate that they are only available for internal use). Is this behavior intentional? Because I don't observe it consistently with other operators like rotate

jantonguirao commented 5 years ago

@addisonklinke CropAttr, Crop, and operator classes in general are not part of the public API (not exported to the so file). Rotate is not exported either. To be able to instantiate operators inside a custom operator we would need to expose their APIs. Let me discuss the consequences with the team and we'll come back to you.

addisonklinke commented 5 years ago

@jantonguirao Ok I see. The main reason I'd like to instantiate operators inside a custom operator is a workaround to allow conditional augmentations (as per my comment above) as long as Dali isn't supporting this feature

jantonguirao commented 5 years ago

@addisonklinke https://github.com/NVIDIA/DALI/pull/487 should allow you to instantiate operators:

#include "dali/pipeline/operators/operator.h"
// In your operator
OpSpec op_spec("crop")
    .AddArg("num_threads", 1)
    .AddArg("batch_size", 32) 
    .AddArg("device", "cpu"); // you can also fetch and forward your operator's arguments
auto op_ptr = InstantiateOperator(op_spec);
op_ptr->Run(ws);

I hope it helps

addisonklinke commented 5 years ago

@jantonguirao Thank you for adding that, it looks like it should help! As a simple example, let's say I'd like to make ops.MyRotate() which is a customer operator that (for now) does exactly the same augmentation as the standard ops.Rotate(). Is the following approach correct for MyRotate.cc?

#include "MyRotate.h"
#include "dali/pipeline/operators/operator.h"

namespace dali
{
    template<>
    void MyRotate<CPUBackend>::RunImpl(SampleWorkspace *ws, const int idx)
    {
        OpSpec op_spec("rotate")
            .AddArg("angle", R"code(Rotation angle)code", DALIDataType::DALI_FLOAT, true);
        auto rotate_ptr = InstantiateOperator(op_spec)
        rotate_ptr->Run(ws)
    }

    DALI_REGISTER_OPERATOR(MyRotate, OpenALPRMeta<CPUBackend>, CPU);

    DALI_SCHEMA(MyRotate)
        .DocStr("Custom operator that wraps the existing Dali Rotate")
        .NumInput(1)  // the image
        .NumOutput(1) // rotated image
        .AddArg("angle", R"code(Rotation angle)code", DALIDataType::DALI_FLOAT, true);
}

A few things I'm unclear about:

  1. Do I need .AddArg("angle") for both op_spec and the DALI_SCHEMA related to MyRotate? I think ws should already contain the named argument, so it seems redundant to specify both places
  2. Once I call rotate_ptr->Run(ws), is there anything special I need to add to be able to access the output (rotated image) from Python if ops.MyRotate() is in the pipeline?
  3. If we change the custom operator to apply both Dali's rotate and crop operators, how do we get 2 images (i.e. rotated and cropped) back in Python? My guess is DALI_SCHEMA.NumOutput(2), but then I don't know how I would determine which operator works on which output
  4. Ultimately, my hope is to have my custom operator generate a single, 4-dimensional output that is a batch of augmented images with shape (n_imgs, n_channels, width, height). That seems to be a good workaround for conditional augmentations since my operator will return a consistently shaped output to the Python pipeline, but the number of images per batch could change each time. If this sounds like a reasonable approach to you, then I need help concatenating the 3D output of multiple instantiated operators into the 4D batch that will be returned to Python
jantonguirao commented 5 years ago

@addisonklinke

  1. About add arg, in the OP_SCHEMA you define the argument, in the op_spec you set the value:
    
    #ifndef EXAMPLE_MY_ROTATE_H_
    #define EXAMPLE_MY_ROTATE_H_

include "dali/pipeline/operators/operator.h"

namespace other_ns {

template class MyRotate : public ::dali::Operator { public: inline explicit MyRotate(const ::dali::OpSpec &spec) : ::dali::Operator(spec) { angle_ = spec.GetArgument("angle"); }

virtual inline ~MyRotate() = default;

MyRotate(const MyRotate&) = delete; MyRotate& operator=(const MyRotate&) = delete; MyRotate(MyRotate&&) = delete; MyRotate& operator=(MyRotate&&) = delete;

protected: void RunImpl(::dali::Workspace *ws, const int idx) override;

private: float angle_; };

} // namespace other_ns

endif // EXAMPLE_MY_ROTATEH

include "MyRotate.h"

include "dali/pipeline/operators/operator.h"

include "dali/pipeline/data/types.h"

namespace other_ns {

template<> void MyRotate<::dali::CPUBackend>::RunImpl(::dali::SampleWorkspace *ws, const int idx) { auto rotateptr = ::dali::InstantiateOperator( ::dali::OpSpec("rotate") .AddArg("angle", angle)); rotate_ptr->Run(ws); }

} // namespace other_ns

DALI_REGISTER_OPERATOR(MyRotate, ::other_ns::MyRotate<::dali::CPUBackend>, ::dali::CPU);

DALI_SCHEMA(MyRotate) .DocStr("MyRotate example") .NumInput(1) .NumOutput(1) .AllowMultipleInputSets() .AddArg("angle", R"code(Rotation angle)code", ::dali::DALIDataType::DALI_FLOAT, true);



2. In this case, MyRotate is just wrapping Rotate, so it would be equivalent to using Rotate directly.

3, 4. Current DALI design doesn't allow changing the batch_size, so this approach would not work.

What you could do is to create all the augmentations in your pipeline (by using Rotate, Crop, etc) and then create a custom operator that processes the samples and outputs a mask determining which samples should be used (e.g. [0, 0, 1, 0, 1]). Then you could write your own iterator in python that would receive your augmentations and the mask and would visit only the samples that were marked as valid. 

Would that be a valid solution for your use case? The downside of this approach is that you will perform computations and use memory that are not necessary.
addisonklinke commented 5 years ago

@jantonguirao Thanks for the details on the Rotate wrapper. Your suggestion for conditional augmentations would provide the correct result, but for my use case speed is of primary importance so I would prefer a different implementation. I'm training a lightweight network where CPU augmentations are by far the bottleneck in training, so we'd like to move as much of that computation as possible to the GPU.

For now, I created a "no-operation" version of each augmentation I want to use (i.e. still running the image through ops.Rotate, but setting the rotation angle to 0 so that nothing changes). Then in define_graph, I use randomly generated numbers to determine whether to apply the real or "no-operation" version of the augmentation. Even though I have to send images through operators without modifying them, this approach seems to waste less computation because every image that comes out of the pipeline can be used in training (vs. only a subset if I use a masking approach).

Feature Suggestion - Implement a "short circuit" flag in operator.h that causes the operator to immediately return the image without modification. Every other operator already inherits from this class, so the behavior would be available even for user-defined operators. This permits "conditional" augmentations (by randomly or deterministically short circuiting an operator) while still maintaining a static graph (since each node uses the same operator across different runs). Hopefully that design change wouldn't interfere too much with other ongoing efforts

addisonklinke commented 5 years ago

Related to my custom ops.BbRotate, I'd like to inquire about some details of ops.Rotate:

1) Is the image rotated about its center pixel? I have been assuming so, but the docstring doesn't explicitly state one way or another

2) In ops.BbRotate I use the following logic (borrowed from here and here) to transform the bounding box coordinates:

x_center = x - 0.5
y_center = y - 0.5
rotated_x = x_center*cos(angle) - y_center*sin(angle) + 0.5
rotated_y = y_center*cos(angle) + x_center*sin(angle) + 0.5

The incoming (x, y) pairs are given in image coordinates (i.e. 0-1 range), so I start by computing their coordinates relative to the center pixel. What's odd is that my rotated coordinates (green points) almost match up with the license plate they're supposed to be bounding but not quite (see screenshots below). Oddly enough, the only time the bounding boxes line up precisely with the image is for a 180° rotation. I don't think my rotation formula could be so far off since it's consistently close to the license plate for a variety of angles. Maybe there is a detail about how ops.Rotate is implemented that I'm missing?

image Zoom of the first image

image Multiple images from a batch showing the same issue

JanuszL commented 5 years ago

Hi, 1) It suppose to be rotated by the center of the image. 2) It looks like issue fixed in https://github.com/NVIDIA/DALI/pull/435 (we are developing GPU variant of resize and then we want to merge that together). You can try to grab this PR and check if that helps with the bbox alignment. Is my guess correct @mzient ?

jantonguirao commented 5 years ago

@addisonklinke

Feature Suggestion - Implement a "short circuit" flag in operator.h that causes the operator to immediately return the image without modification. Every other operator already inherits from this class, so the behavior would be available even for user-defined operators. This permits "conditional" augmentations (by randomly or deterministically short circuiting an operator) while still maintaining a static graph (since each node uses the same operator across different runs). Hopefully that design change wouldn't interfere too much with other ongoing efforts

Thanks for the suggestion. It sounds like a doable feature! We have added it to our TODO list (DALI-515) but we cannot promise any schedule for it at the moment.

addisonklinke commented 5 years ago

@JanuszL Thank you for the suggestion. I've solved the offset in the bounding boxes - it was due to using image coordinates instead of raw pixel coordinates. Because the rotation transform contains non-linear functions like sin and cos, it doesn't preserve the normalization if you transform back to raw pixel coordinates afterwards

addisonklinke commented 5 years ago

@jantonguirao Thanks for considering the feature! I tried your ops.MyRotate within a pipeline (built locally from commit 2aef923880a4e686c9dfabd54927616cfc2793a4), and get an error when fetching the angle:

About to fetch argument
Traceback (most recent call last):
  File "tmp.py", line 51, in <module>
    pipe.build()
  File "/home/addison/miniconda3/envs/openalpr/lib/python3.6/nvidia/dali/pipeline.py", line 204, in build
    self._pipe.Build(self._names_and_devices)
RuntimeError: [/home/addison/miniconda3/envs/openalpr/lib/python3.6/nvidia/dali/include/dali/pipeline/operators/op_spec.h:351] 
Assert on "ws != nullptr" failed: Tensor value is unexpected for argument "angle".

Here is the code for the pipeline I'm running:

import os
import numpy as np
import nvidia.dali.ops as ops
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.plugin_manager as plugin_manager

plugin_manager.load_library('/path/to/libmyrotate.so')

class MyLoader():
    """Load images from filepath."""
    def __init__(self, img_dir):
        self.img_dir = img_dir
        self.img_paths = [os.path.join(self.img_dir, f) for f in os.listdir(self.img_dir)]
        self.imgs = []
        for path in self.img_paths:
            with open(path, 'rb') as f:
                self.imgs.append(np.frombuffer(f.read(), dtype=np.uint8))
        self.batch_size = len(self.imgs)

    def __call__(self, norm=False):
        return self.imgs

class MyPipe(Pipeline):
    """Pipeline to return output from MyLoader."""
    def __init__(self, loader, num_threads=1, device_id=0):
        super(MyPipe, self).__init__(loader.batch_size, num_threads, device_id, seed=1)
        self.loader = loader
        self.input = ops.ExternalSource()
        self.rotate = ops.MyRotate()
        self.rgn = ops.Uniform(range=(-15., 15.))

    def define_graph(self):
        self.imgs = self.input()
        rotated = self.rotate(self.imgs, angle=self.rgn())
        return rotated

    def iter_setup(self):
        imgs = self.loader()
        self.feed_input(self.imgs, imgs)

pipe = MyPipe(MyLoader('/path/to/imgs'))
pipe.build()
pipe_out = pipe.run()

I am confused because MyPipe.rgn() uses ops.Uniform which is the suggested approach here for passing named arguments to operators. In MyRotate.h I compiled a couple print statements which confirm that the error occurs when trying to fetch _angle from the OpSpec.

#ifndef EXAMPLE_MY_ROTATE_H_
#define EXAMPLE_MY_ROTATE_H_

#include "dali/pipeline/operators/operator.h"

namespace dali {

    template <typename Backend>
    class MyRotate : public Operator<Backend> {
        public:
            inline explicit MyRotate(const OpSpec &spec) :
            Operator<Backend>(spec) {
                std::cout << "About to fetch argument" << std::endl;
                angle_ = spec.GetArgument<float>("angle");  // line that triggers error
                std::cout << "Received argument" << std::endl;
            }

            virtual inline ~MyRotate() = default;

            MyRotate(const MyRotate&) = delete;
            MyRotate& operator=(const MyRotate&) = delete;
            MyRotate(MyRotate&&) = delete;
            MyRotate& operator=(MyRotate&&) = delete;

        protected:
            void RunImpl(Workspace<Backend> *ws, const int idx) override;

        private:
            float angle_;
    };
}

#endif  // EXAMPLE_MY_ROTATE_H_

If I replace the rotate attribute of MyPipe with ops.MyRotate(angle=15.) to avoid using ops.Uniform, I get a different error. The _angle argument is successfully fetched based on the print statements, but afterwards you get:

About to fetch argument
Received argument
Traceback (most recent call last):
  File "tmp.py", line 52, in <module>
    pipe_out = pipe.run()
  File "/home/addison/miniconda3/envs/openalpr/lib/python3.6/nvidia/dali/pipeline.py", line 280, in run
    return self.outputs()
  File "/home/addison/miniconda3/envs/openalpr/lib/python3.6/nvidia/dali/pipeline.py", line 250, in outputs
    return self._share_outputs()
  File "/home/addison/miniconda3/envs/openalpr/lib/python3.6/nvidia/dali/pipeline.py", line 259, in _share_outputs
    return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline: Error in thread 0: [/home/addison/Documents/git/dali/dali/pipeline/operators/op_schema.h:355] 
Assert on "it != schema_map.end()" failed: Schema for operator 'rotate' not registered
Current pipeline object is no longer valid.

ops.Rotate should already be registered by default, and MyRotate.cpp has the DALI_REGISTER_OPERATOR command from your example, so I am confused how the schema is not registered

jantonguirao commented 5 years ago

@addisonklinke Here are couple of points necessary for your example to work as expected:

  1. The operator is registered as "Rotate" (not "rotate")
  2. Since you are passing an argument input, you need to fetch the angles per sample id.

Here I am sharing with you an example that works, I hope it's useful for you:

#ifndef EXAMPLE_MY_ROTATE_H_
#define EXAMPLE_MY_ROTATE_H_

#include "dali/pipeline/operators/operator.h"

namespace other_ns {

    template <typename Backend>
        class MyRotate : public ::dali::Operator<Backend> {
    public:
        inline explicit MyRotate(const ::dali::OpSpec &spec)
            : ::dali::Operator<Backend>(spec) {
        }

        virtual inline ~MyRotate() = default;

        MyRotate(const MyRotate&) = delete;
        MyRotate& operator=(const MyRotate&) = delete;
        MyRotate(MyRotate&&) = delete;
        MyRotate& operator=(MyRotate&&) = delete;

    protected:
        void RunImpl(::dali::Workspace<Backend> *ws, const int idx) override;
    };

}  // namespace other_ns                                                                                                                                                                                    

#endif  // EXAMPLE_MY_ROTATE_H_
#include "MyRotate.h"
#include "dali/pipeline/operators/operator.h"

namespace other_ns {

template<>
void MyRotate<::dali::CPUBackend>::RunImpl(::dali::SampleWorkspace *ws, const int idx) {
    std::cout << "About to fetch argument " << ws->data_idx() << std::endl;
    float angle = spec_.GetArgument<float>("angle", ws, ws->data_idx());
    std::cout << "Received argument " << angle << std::endl;
    auto rotate_ptr = ::dali::InstantiateOperator(
        ::dali::OpSpec("Rotate")
        .AddArg("angle", angle)
    .AddArg("num_threads", spec_.GetArgument<int>("num_threads"))
        .AddArg("batch_size", spec_.GetArgument<int>("batch_size"))
    );
    rotate_ptr->Run(ws);
}

}  // namespace other_ns                                                                                                                                                                                    

DALI_REGISTER_OPERATOR(MyRotate, ::other_ns::MyRotate<::dali::CPUBackend>, ::dali::CPU);

DALI_SCHEMA(MyRotate)
.DocStr("MyRotate example")
.NumInput(1)
.NumOutput(1)
.AllowMultipleInputSets()
.AddArg("angle",
    R"code(Rotation angle)code", ::dali::DALIDataType::DALI_FLOAT, true);
from __future__ import division
import os
import numpy as np
import nvidia.dali.ops as ops
import nvidia.dali.types as types
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.plugin_manager as plugin_manager
import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt
#%matplotlib inline                                                                                                                                                                                         

def show_images(image_batch):
    columns = 4
    rows = (batch_size + 1) // (columns)
    fig = plt.figure(figsize = (32,(32 // columns) * rows))
    gs = gridspec.GridSpec(rows, columns)
    for j in range(rows*columns):
        plt.subplot(gs[j])
        plt.axis("off")
        plt.imshow(image_batch.at(j))
    plt.show()

path_to_plugin = 'path/to/your/plugin.so'
path_to_images = '/path/to/image/folder' # subfolders with label name are expected
batch_size = 8

plugin_manager.load_library(path_to_plugin)

class MyPipe(Pipeline):
    def __init__(self, path_to_images, batch_size, num_threads=1, device_id=0):
        super(MyPipe, self).__init__(batch_size, num_threads, device_id, seed=12)
        self.input = ops.FileReader(file_root = path_to_images)
        self.decode = ops.HostDecoder(output_type = types.RGB)
        self.rotate = ops.MyRotate()
        self.rgn = ops.Uniform(range=(-15., 15.))

    def define_graph(self):
        jpegs, _ = self.input()
        images = self.decode(jpegs)
        rotated = self.rotate(images, angle=self.rgn())
        return rotated

pipe = MyPipe(path_to_images=path_to_images, batch_size=batch_size)
pipe.build()
pipe_out = pipe.run()
show_images(pipe_out[0])
addisonklinke commented 5 years ago

@jantonguirao Excellent, fetching angles by sample ID is what I needed!

To implement ops.MyRotate on GPU, I tried compiling with the following MyRotate.cu file:

#include <cuda_runtime_api.h>
#include "MyRotate.h"

namespace dali {
    template<>
    void MyRotate<GPUBackend>::RunImpl(DeviceWorkspace *ws, const int idx) {
        float angle = spec_.GetArgument<float>("angle", ws, ws->data_idx());
        auto rotate_ptr = InstantiateOperator(
            OpSpec("Rotate")
                .AddArg("angle", angle)
                .AddArg("num_threads", spec_.GetArgument<int>("num_threads"))
                .AddArg("batch_size", spec_.GetArgument<int>("batch_size")));
        CUDA_CALL(rotate_ptr->Run(ws));
    }

    DALI_REGISTER_OPERATOR(CustomMyRotate, MyRotate<GPUBackend>, GPU);
}

However, I get the following errors from cmake:

MyRotate.cu(7): error: class "dali::DeviceWorkspace" has no member "data_idx"
MyRotate.cu(13): error: incomplete type is not allowed
MyRotate.cu(13): error: no instance of function template "dali::cudaResultCheck" matches the argument list
            argument types are: (<error-type>)

What is the correct way to implement a combined CPU, GPU wrapper?

jantonguirao commented 5 years ago

@addisonklinke I found an simpler way to forward the argument input:

template<>
void MyRotate<::dali::CPUBackend>::RunImpl(::dali::SampleWorkspace *ws, const int idx) {
    auto rotate_ptr = ::dali::InstantiateOperator(
        ::dali::OpSpec("Rotate")
          .AddArgumentInput("angle", "angle")
          .AddArg("device", "cpu")
          .AddArg("num_threads", spec_.GetArgument<int>("num_threads"))
          .AddArg("batch_size", spec_.GetArgument<int>("batch_size"))
        );
    rotate_ptr->Run(ws);
}

It should work also in your GPU implementation (just change the "device" argument). You could also unify it into one implementation

addisonklinke commented 5 years ago

@jantonguirao Great, that works for both the CPU and GPU implementations.

In my pipeline, I load images onto the GPU with ops.nvJPEGDecoder and bounding boxes to the CPU with ops.ExternalSource. I'd like to be able to pass both as input to an operator, but am running into limitations with CPU/mixed/GPU arguments and operators. For intance, ops.ExternalSource can't be registered as a mixed operator and ops.Copy can only take CPU input. If I make ops.ExternalSource on the GPU, then the operator complains that it cannot accept non-contiguous data.

It is important for me to do all image augmentations on GPU, but the bounding box calculations can happen on either CPU or GPU (whichever is more convenient). However, both inputs need to go to the same operator since I want to repeat the image augmentation until the bounding box is completely within the augmented image before returning the operator's output back to the Python pipeline. Is there a suggested approach to accomplish this combination of CPU/GPU work?

jantonguirao commented 5 years ago

@addisonklinke You can use .gpu() when using a cpu output in a gpu operator:

def __init__(...):
    ...
    self.decode = ops.nvJPEGDecoder(device="mixed", output_type=types.RGB)
    self.other_input = ops.ExternalSource()

def define_graph(self):
    ...
    images = self.decode(...)
    other_input = self.other_input()
    rotated = self.gpu_op(images, other_input.gpu(), ...)
addisonklinke commented 5 years ago

@jantonguirao I am able to run the GPU implementation without errors, but for my specific operator the bounding boxes are not correct when plotted. I use the same formula as the CPU implementation of my operator (which plots perfectly).

Therefore, the error seems to come from using the argument values for the first sample across the entire batch. For instance, I randomly generate the rotation angle and use each image's height/width dimensions (which are not constant for the entire batch). This means the bounding box for the first sample in the batch is always correct, but subsequent samples are off. Using

float angle = spec_.GetArgument<float>("angle", ws, ws->data_idx());

works great in the CPU implementation to get the angle unique to each sample. However, for GPU I will get an error because dali::DeviceWorkspace has no member data_idx. If I stick with using .AddArgumentInput("angle", "angle"), it will incorrectly use the angle for the first sample for all samples. Is it possible to use a different angle for each sample in the batch on GPU, and if so how?

EDIT I'm not sure if this is the proper way to get angles, but I realized the following works:

for (int i=0; i < batch_size_; i++) 
{
    float angle = spec_.GetArgument<float>("angle", ws, i);
    std::cout << "Angle for sample " << i << ": " << angle << std::endl;
}

However, I am still stuck on how to get the image dimensions for each sample in the batch. I have two custom operators: one that rotates bounding boxes (called ops.BbRotate) and one that wraps the former together with Dali's existing ops.Rotate (called ops.AlprRotate). Below is the pseudo code for my current approach

AlprRotate.cu

void AlprRotate<GPUBackend>::RunImpl(DeviceWorkspace *ws, const int idx)
{
    // Instantiate and run custom BbRotate operator
    auto bbrotate_ptr = InstantiateOperator(            
        OpSpec("BbRotate").AddArgumentInput("angle", "angle"))
    bbrotate_ptr->Run(ws);

    // Instantiate and run Dali's Rotate operator
    auto rotate_ptr = InstantiateOperator(
        OpSpec("Rotate").AddArgumentInput("angle", "angle"))
    rotate_ptr->Run(ws);
}

BbRotate.cu

template<>
void BbRotate<GPUBackend>::RunImpl(DeviceWorkspace *ws, const int idx)
{
    const auto &image = ws->Input<GPUBackend>(0);
    // How do you get (height, width) for each image in the batch?
    const auto &corners = ws->Input<GPUBackend>(1);
    auto &rotated = ws->Output<GPUBackend>(1);

    // Do the rotation math for each pair of points in corners
    // This part works fine
}

Both BbRotate and AlprRotate take 2 inputs and return 2 outputs (namely the image and the bounding box coordinates). In BbRotate.cu, I would expect const shape = image.shape() to work, but it doesn't return the same values as when it's used inside AlprRotate.cu. If I print shape values in BbRotate, they are mostly nonsense (i.e. >10,000), but the printed shapes inside AlprRotate.cu are correct. Why would they differ since they are accessing the same workspace?

jantonguirao commented 5 years ago

I tried to reproduce your scenario by writing two custom operators:

Here you can see how to get the shapes of each sample

#ifndef EXAMPLE_MY_ROTATE_H_
#define EXAMPLE_MY_ROTATE_H_

#include <string>
#include "dali/pipeline/operators/operator.h"
#include "dali/pipeline/workspace/workspace.h"
#include "dali/pipeline/workspace/sample_workspace.h"
#include "dali/pipeline/workspace/device_workspace.h"

namespace other_ns {

    template <typename Backend>
        class MyRotate : public ::dali::Operator<Backend> {
    public:
        inline explicit MyRotate(const ::dali::OpSpec &spec)
            : ::dali::Operator<Backend>(spec)
            , spec_(spec) {
        }

        virtual inline ~MyRotate() = default;

        MyRotate(const MyRotate&) = delete;
        MyRotate& operator=(const MyRotate&) = delete;
        MyRotate(MyRotate&&) = delete;
        MyRotate& operator=(MyRotate&&) = delete;

        void print_shape(::dali::SampleWorkspace *ws) {
            const auto& image = ws->Input<Backend>(0);
            ::dali::Dims shape = image.shape();
            std::cout << "MyRotate CPU says shape for sample #" << ws->data_idx()
                      << " is: " << shape[0] << " " << shape[1] << " " << shape[2] << std::endl;
        }

        void print_shape(::dali::DeviceWorkspace *ws) {
            const auto& images = ws->Input<Backend>(0);
            std::vector<::dali::Dims> shapes = images.shape();
            for (int i = 0; i < shapes.size(); i++) {
                const auto& shape = shapes[i];
                std::cout << "MyRotate GPU says shape for sample #" << i
                          << " is: " << shape[0] << " " << shape[1] << " " << shape[2] << std::endl;
            }
        }

    protected:
        void RunImpl(::dali::Workspace<Backend> *ws, const int idx) override {
            print_shape(ws);
            auto rotate_ptr = ::dali::InstantiateOperator(
                ::dali::OpSpec("Rotate")
                .AddArgumentInput("angle", "angle")
                .AddArg("device", spec_.GetArgument<std::string>("device"))
                .AddArg("num_threads", spec_.GetArgument<int>("num_threads"))
                .AddArg("batch_size", spec_.GetArgument<int>("batch_size")) );
            rotate_ptr->Run(ws);
        }

        ::dali::OpSpec spec_;
    };

}  // namespace other_ns                                                                                                                                                                                    

#endif  // EXAMPLE_MY_ROTATE_H_ 
#ifndef EXAMPLE_MY_ROTATE_WRAP_H_
#define EXAMPLE_MY_ROTATE_WRAP_H_

#include <string>
#include "dali/pipeline/operators/operator.h"

namespace other_ns {

    template <typename Backend>
        class MyRotateWrap : public ::dali::Operator<Backend> {
    public:
        inline explicit MyRotateWrap(const ::dali::OpSpec &spec)
            : ::dali::Operator<Backend>(spec)
            , spec_(spec) {
        }

        virtual inline ~MyRotateWrap() = default;

    MyRotateWrap(const MyRotateWrap&) = delete;
        MyRotateWrap& operator=(const MyRotateWrap&) = delete;
    MyRotateWrap(MyRotateWrap&&) = delete;
        MyRotateWrap& operator=(MyRotateWrap&&) = delete;

        void print_shape(::dali::SampleWorkspace *ws) {
            const auto& image = ws->Input<Backend>(0);
            auto shape = image.shape();
            std::cout << "MyRotateWrap CPU says shape for sample #" << ws->data_idx()
                      << " is: " << shape[0] << " " << shape[1] << " " << shape[2] << std::endl;
        }

        void print_shape(::dali::DeviceWorkspace *ws) {
            const auto& images = ws->Input<Backend>(0);
            auto shapes = images.shape();
            for (int i = 0; i < shapes.size(); i++) {
                const auto& shape = shapes[i];
                std::cout << "MyRotateWrap GPU says shape for sample #" << i
                          << " is: " << shape[0] << " " << shape[1] << " " << shape[2] << std::endl;
            }
        }

    protected:
        void RunImpl(::dali::Workspace<Backend> *ws, const int idx) override {
            print_shape(ws);
            auto rotate_ptr = ::dali::InstantiateOperator(
                ::dali::OpSpec("MyRotate")
                .AddArgumentInput("angle", "angle")
                .AddArg("device", spec_.GetArgument<std::string>("device"))
                .AddArg("num_threads", spec_.GetArgument<int>("num_threads"))
                .AddArg("batch_size", spec_.GetArgument<int>("batch_size")) );
            rotate_ptr->Run(ws);
        }

        ::dali::OpSpec spec_;
    };

}  // namespace other_ns                                                                                                                                                                                    

#endif  // EXAMPLE_MY_ROTATE_H_                                                                                                                                                                             
class MyPipe(Pipeline):
    """Pipeline to return output from MyLoader."""
    def __init__(self, path_to_images, batch_size, num_threads=1, device_id=0):
        super(MyPipe, self).__init__(batch_size, num_threads, device_id, seed=12)
        self.input = ops.FileReader(file_root = path_to_images)
        self.decode = ops.HostDecoder(output_type = types.RGB)
        self.rotate_gpu = ops.MyRotateWrap(device="gpu")
        self.rotate_cpu = ops.MyRotateWrap(device="cpu")
        self.rgn = ops.Uniform(range=(-15., 15.))

    def define_graph(self):
        jpegs, _ = self.input()
        images = self.decode(jpegs)
        angles = self.rgn()
        rotated_gpu = self.rotate_gpu(images.gpu(), angle=angles)
        rotated_cpu = self.rotate_cpu(images, angle=angles)
        return rotated_gpu, rotated_cpu
addisonklinke commented 5 years ago

@jantonguirao Your example above is working great for getting the shape of each image in the batch. I'm curious if it's possible to directly modify the angle values contained in the spec from the outer operator MyRotateWrap?

In the trivial example below, I only increment each sample's angle by 1. However in my use case, I want to check each randomly generated rotation angle from Python to ensure that it doesn't push the bounding box out of frame, and if it does, modify the angle value in the spec before running the inner operator. For a CPU implementation, this is straightforward since each sample is handled individually. However, when running on GPU the operator modifies all its samples at once so it makes sense to change the spec ahead of time.

MyRotateWrap.cu

void MyRotateWrap<GPUBackend>::RunImpl(DeviceWorkspace *ws, const int idx) {
    std::cout << "Angles received by MyRotateWrap from Python" << std::endl;
    for (int i=0; i < shapes.size(); i++) {
        float angle = spec_.GetArgument<float>("angle", ws, i);
        std::cout << "  " << i << ") " << angle << std::endl;
    }

    // Modify the arguments before sending to the inner operator
    for (int i=0; i < shapes.size(); i++) {
        spec_.angle[i] += 1;  // Magic command I wish existed
    }

    auto rotate_ptr = InstantiateOperator(
        OpSpec("MyRotate")
            .AddArgumentInput("angle", "angle")  // Forward to MyRotate after modifying
            .AddArg("device", spec_.GetArgument<std::string>("device"))
            .AddArg("num_threads", spec_.GetArgument<int>("num_threads"))
            .AddArg("batch_size", spec_.GetArgument<int>("batch_size")) );
    rotate_ptr->Run(ws);
}

MyRotate.cu

void MyRotate<GPUBackend>::RunImpl(DeviceWorkspace *ws, const int idx) {
    std::cout << "Angles received by MyRotate from MyRotateWrap" << std::endl;
    for (int i=0; i < shapes.size(); i++) {
        float angle = spec_.GetArgument<float>("angle", ws, i);
        std::cout << "  " << i << ") " << angle << std::endl;  // Should be incremented 1
    }
}
jantonguirao commented 5 years ago

@addisonklinke You can read all the angles at once before the operator run with SetupSharedSampleParams

// define a new member in your operator
std::vector<float> angles_;

// override SetupSharedSampleParams
void SetupSharedSampleParams(::dali::Workspace<Backend> *ws) override {
    auto batch_size = spec_.GetArgument<int>("batch_size");
    for (int data_idx = 0; data_idx < batch_size; data_idx++) {
        auto angle = spec_.GetArgument<float>("angle", ws, data_idx);
        angle++;
        std::cout << "angle is " << angle << std::endl;
        angles_.push_back(angle);
    }
}

Then just use angles_ in your operator

addisonklinke commented 5 years ago

@jantonguirao That works in the outer operator MyRotateWrap, but is there a way to pass the modified angles to MyRotate? In MyRotateWrap.cu, I tried adding

auto myrotate_ptr = InstantiateOperator(
    OpSpec("MyRotate")
        .AddArg("angle", angles_)  // Pass modified angles_ to inner operator 
        .AddArg("device", "gpu")
        .AddArg("num_threads", spec_.GetArgument<int>("num_threads"))
        .AddArg("batch_size", spec_.GetArgument<int>("batch_size")));

This will compile, but at runtime MyRotate will error because it is expecting "angle" to be type float based on the schema, but MyRotateWrap passed it a std::vector<float> instead. Overriding SetupSharedSampleParams for MyRotate seems like it would accomplish what I need, but it would be nice to only have to do this once for the outer operator since the logic for modifying the angles will be the same.

jantonguirao commented 5 years ago

@addisonklinke Modifying the argument inputs in the workspace is not something we support. However, you could still do it (a little hacky solution though):

        void SetupSharedSampleParams(::dali::Workspace<Backend> *ws) override {
            auto& angles_tensor = ws->ArgumentInput("angle");
            const auto size = angles_tensor.size();
            for (int data_idx = 0; data_idx < size; data_idx++) {
                float *angles = const_cast<float*>(
                    angles_tensor.template data<float>());

                // e.g. Modify the angles when a condition occurs                                                                                                                                           
                if ( data_idx % 2 == 0 )
                    angles[data_idx] = 0.0f;

                std::cout << "angle is " << angles[data_idx] << std::endl;
            }
        }
addisonklinke commented 5 years ago

@jantonguirao That hack worked well for arguments like angle which are a single value.

I tried the same approach for in-place modifications to the crop argument for ops.ResizeCropMirror (which is a vector of two float values). First, I updated the schema of my operator to mimic the optional crop argument as used in your crop.cc

DALI_SCHEMA(MyOperator)
    .AddOptionalArg("crop", "Internal use only, will be forwarded to Dali's "
        "ResizeCropMirror operator after modification", std::vector<float>{0.f, 0.f});

Then, inside MyOperator<GPUBackend>::SetupSharedSampleParams I tried

auto &dali_crops_tensor = ws->ArgumentInput("crop");
float *dali_crops = const_cast<float*>(dali_crops_tensor.template data<float>());

This compiles, but creates a runtime error during the call to pipe.run()

RuntimeError: Critical error in pipeline: [.../dali/pipeline/workspace/workspace.h:56] 
Assert on "argument_inputs_.find(arg_name) != argument_inputs_.end()" failed
Argument "crop" not found.
Current pipeline object is no longer valid.

It seems that either ws->ArgumentInput does not work for optional arguments, or there is some difficulty fetching an argument that is a vector as opposed to a single value for each sample in the batch. Is there a different workspace member function I need to use to access optional vector arguments?

EDIT Alternatively, it might work to create a new crop_ variable like you suggested above for angles_

std::vector angles; angles.push_back(angle);

Would it then be possible to do this:

auto resize_ptr = InstantiateOperator(
    OpSpec("ResizeCropMirror")
        .AddArg("crop", crop_);

My problem is that I don't know what type crop_ should be in order to pass the arguments for an entire batch at once (as opposed to just a single sample)?

jantonguirao commented 5 years ago

@addisonklinke Crop operator (from where ResizeCropMirror inherits) exposes the argument crop as a fixed argument (as opposed to an argument input which has values per sample). So, if you use Crop the cropping window dimensions are fixed (the window position is a variable though).

If you want to have per sample cropping window dimensions you should use Slice operator instead. Please note that Slice operator takes crop parameters as regular inputs (not argument inputs). We have in mind to unify those two operators into something that can accept either fixed or per sample crop window sizes, but we are not there yet.

By the way, I found a better way to change the angle values in your rotate wrapper, that doesn't involve const_cast:

        void SetupSharedSampleParams(::dali::Workspace<Backend> *ws) override {
            std::shared_ptr<::dali::Tensor<::dali::CPUBackend>> angle2(
                new ::dali::Tensor<::dali::CPUBackend>());
            angle2->Copy(ws->ArgumentInput("angle"), 0);
            // now we have a copy of angle                                                                                                                                                                  
            const auto size = angle2->shape()[0];
            for (int data_idx = 0; data_idx < size; data_idx++) {
                float *angles = angle2->template mutable_data<float>();
                if ( data_idx % 2 == 0 )
                    angles[data_idx] = 0;
                std::cout << "new angle is " << angles[data_idx] << std::endl;
            }
            ws->SetArgumentInput(angle2, "angle");
        }
addisonklinke commented 5 years ago

@jantonguirao Ahh that makes more sense. In the CPU implementation of MyOperator, I was able to generate per sample crop dimensions because I was wrapping ResizeCropMirror and calling it individually for each sample. That's why I'd thought crop was an argument input. Is there a way I can tell from the documentation whether an operator uses fixed arguments vs. argument inputs?

Are all the parameters to ResizeCropMirror (such as resize_x and resize_y) also fixed arugments? I get the same "Argument not found" error if I try ws->ArgumentInput("resize_x") inside MyOperator::SetupSharedSampleParams, so that would seem to indicate that resize_x is also a fixed argument.

Based on your feedback, it sounds like wrapping the Resize and Slice operators together would allow me to achieve per sample resize/crop arguments on GPU. Do either of those operators have any fixed arguments I need to worry about?

When an operator takes an argument input with a single value (i.e. angle, resize_x, resize_y, etc) as opposed to a vector value like crop, how is this stored internally for the whole batch? For instance, as a Tensor with [angle1, angle2, angle3, ... angleN] for N samples in the batch

jantonguirao commented 5 years ago

In the documentation you'll see something like float or float tensor, meaning that you can provide a tensor specifying the values per sample, or a single value.

Example: | resize_x : float or float tensor, optional, default = 0.0 | The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image. | resize_y : float or float tensor, optional, default = 0.0

So, resize_x, resize_y are argument inputs.

Argument inputs are stored as a Tensor<CPUBackend>. You can see in my previous snippet, how I created a new Tensor and set it as an argument input.

I just noticed that the documentation of slice is a little bit misleading since it is showing the arguments for Crop as well. Please ignore crop, crop_pos_x, crop_pos_y for Slice. The inputs you need to provide are regular inputs. I'll fix the documentation soon.

addisonklinke commented 5 years ago

That's good to know for the documentation.

The current goal with MyOperator is to take only three argument inputs: crop_scale (0, 1], resize_scale [1, inf), and aspect_ratio (0, inf). These will tell me the percent of the image width to crop, scaling factor for the width, and aspect ratio between width and height, respectively. This is much simpler for my dataset since the image dimensions are highly variable. Inside MyOperator I will check the random generated values for each of these argument inputs, modify them to ensure the bounding box is within the resulting crop, and finally translate these to the crop, crop_pos_x, crop_pos_y, resize_x, and resize_y arguments expected by ResizeCropMirror (or the combination of Resize and Slice you suggested).

The difficulty is that I cannot do resize_x_2 = Copy(ws->ArugmentInput("resize_x"), 0) since resize_x is not in the schema of MyOperator. As a workaround, I tried

.AddOptionalArg(
    "resize_x", 
    "Not for use by MyOperator, but only for Dali's ResizeCropMirror", 
    0.f, 
    true)

thinking this would add it to the workspace but not require me to specify a value from the Python pipeline. However, in that case ws->ArugmentInput returns the "Argument not found" error. Maybe there is an OpSpec method I could use inside MyOperator::SetupSharedSampleParams to add resize_x to the workspace, or do I have to have it in the schema of MyOperator?

jantonguirao commented 5 years ago

You don't need to do the Copy. You can allocate the tensor from scratch, then use AddArgumentInput to add a new argument input to your workspace:

            int batch_size = spec_.GetArgument<int>("batch_size");
            std::shared_ptr<::dali::Tensor<::dali::CPUBackend>> angle2(
                new ::dali::Tensor<::dali::CPUBackend>());
            angle2->set_type(::dali::TypeInfo::Create<float>());
            angle2->Resize({batch_size});
            for (int data_idx = 0; data_idx < batch_size; data_idx++) {
                float *angles = angle2->template mutable_data<float>();
                if ( data_idx % 2 == 0 )
                    angles[data_idx] = 0.0f;
                else
                    angles[data_idx] = 15.0f;
                std::cout << "new angle is " << angles[data_idx] << std::endl;
            }
            // Adding a non-existing argument input                                                                                                                                                         
            ws->AddArgumentInput(angle2, "angle2");
            // Overriding an existing argument input                                                                                                                                                        
            ws->SetArgumentInput(angle2, "angle");
addisonklinke commented 5 years ago

@jantonguirao Is there a way to access Tensor<GPUBackend> data via a pointer? For instance, I've tried

TensorList<GPUBackend> &output = ws->Output<GPUBackend>(0);
const TensorList<GPUBackend> &input = ws->Input<GPUBackend>(0);
const double *input_data = input.data<double>();

for (int i=0; i < input.size(); i++) {
    std::cout << "Input[" << i << "] = " << input_data[i] << std::endl;
}

But this causes a segmentation fault. As a temporary workaround, I've been transferring the data to the CPU, performing calculations there, and then copying back. However, this is really inefficient

// Copy input to output
output.set_type(input.type());
output.ResizeLike(input);
double *output_data = output.mutable_data<double>();

// Transfer to CPU
double input_cpu[input.nbytes()];
CUDA_CALL(cudaMemcpy(
    input_cpu,
    input.raw_data(),
    input.nbytes(),
    cudaMemcpyDeviceToHost));

// Do computation (currently on CPU, but preferably on GPU)
for (int i=0; i < output.size(); i++) {
    output_data[i] += 1;
}

// Transfer back to GPU for output
CUDA_CALL(cudaMemcpy(
    output.raw_mutable_data(),
    input_cpu,
    input.nbytes(),
    cudaMemcpyHostToDevice));
JanuszL commented 5 years ago

Hi, It is not possible on the CPU as data in on the GPU. You need to develop GPU kernel that can access this data directly on the GPU.

addisonklinke commented 5 years ago

@jantonguirao @JanuszL @Kh4L Thank you all for your help with this! My team has been able to run our training over 2x faster by using Dali :+1:

The accuracy with the new augmentations is slightly lower, so we will be making some minor tweaks to the pipeline to address this. However, I think the core issue of creating our own operators has been solved, so I am closing this issue. You have provided some excellent examples in this thread, and I hope they help people with similar questions in the future.

jantonguirao commented 5 years ago

@addisonklinke That is awesome! We are very happy to hear that you managed to complete your custom pipeline and that DALI has provided such a performance boost.