Open gartnera opened 4 years ago
Hi @gartnera, looks really cool, we would be happy to review and merge such contribution. It has a great potential of bridging the gap between custom Python Ops and native C++ Operators. Regards, Krzysztof.
I agree, it looks super cool and would be a fine extension to the Python function functionality.
Hello @gartnera! That's an excellent idea for a contribution to DALI. Our current solutions for custom operators lack simplicity (writing a plugin) or performance (all kinds of PythonFunction operators). I imagine that in some cases, when user requires some very specific data augmentations in their pipelines, this feature might be game-changing.
It's great to hear you are interested in developing it further. I thought about some things that should be addressed to make it full-blown DALI operator:
Universality
Such operator should handle varying number of inputs and outputs of different data types and shapes. It's not straightforward how to extend your prototype to support that. One thing is handling multiple inputs - that can be probably done by passing the data as void**
(array of pointers to input samples). The same goes with passing shapes and dtypes. All of these could be maybe packed in some convenient structure.
Receiving outputs from the user defined cfunc
is another story, though. In the prototype, an output is assumed to have the same size as an input and is preallocated. We would probably like to lift that assumption, so the question arises - how much memory should we allocate for the outputs of a custom cfunc
? We could force a user to pass an argument that says how big are the outputs, but we perhaps can avoid that somehow. Maybe you have some ideas how to approach this? Let us know.
Although, we don't have to start with an operator that has all the features we can imagine. We can have an operator that covers only some of the use-cases but with a design that allows extending it in the future.
Simplicity
This ought to be an operator that gives quite good performance, so we can sacrifice some of the straightforwardness compared to the PythonFunction
operators but still we can make it as user-friendly as it's possible. For example, if we pass so many parameters to the custom function (data, shapes, dtypes) then maybe we can provide some helper functions that extract carray
s
from the raw arguments.
And again, making it super-easy to use might not be the first priority and it's ok to start with something more cumbersome to show that it's feasible to have such feature in DALI.
Anyway, even though it still needs further development to be sure how does it fit into DALI it seems to be very promising. I will be happy to provide you any help with making this operator (it should have a name - you can propose something) a part of DALI. Feel free to share any thoughts or questions.
Regards, Rafał
Thanks for the thoughts/feedback. Lots of stuff I'm not sure how to do, so it will take a bit of fiddling with. Ultimately I'd like to see arbitrary dtype + shape input and output.
One thing is handling multiple inputs - that can be probably done by passing the data as void** (array of pointers to input samples).
Yeah I specifically want to reference both the data and the label in the function so I'll be trying to figure this out.
Receiving outputs from the user defined cfunc is another story, though. In the prototype, an output is assumed to have the same size as an input and is preallocated. We would probably like to lift that assumption, so the question arises - how much memory should we allocate for the outputs of a custom cfunc? We could force a user to pass an argument that says how big are the outputs, but we perhaps can avoid that somehow. Maybe you have some ideas how to approach this? Let us know.
It would be helpful if/when you want to convert precision. Maybe require the user to provide their c_sig
to the operator for inspection. But that doesn't help if the num_elements
changes. Maybe just some static factor (growth_multiplier=4
when converting from uint8
to uint64
, growth_multiplier=3
when grayscale to RGB). But I'd also like the ability to reduce the size too, maybe a negative growth_multiplier
(growth_multiplier=-3
when RGB to grayscale) or maybe a different variable (reduction_multiplier
).
I don't see any way of malloc
in a cfunc
which would probably be the most robust as the user could just calculate the size themselves. Another thought is to have the user provide another cfunc
with calculates the expected size.
it should have a name - you can propose something
Not sure if it should really be called NumbaOp
because all it really does is call an arbitrary function pointer. But maybe if it becomes more dependent on numba features.
I like the idea of a separate cfunc to determine output sizes. It's actually what our operators do in SetupImpl. Such cfunc could set the output shapes and dtypes and actually be called in the SetupImpl. Also if we would like to have an output preserving input shape but with another type, an argument for output_dtype
would be enough. Probably it's a good idea to have multiple options to set the output size - from the simple and quite specific (output shape or/and output dtype parameters) to more convoluted but generic like a separate function.
Also, as you say, this operator's implementation is actually very generic because it just calls a function by pointer it got. If you define an API that such function should conform to, the use of this operator might go beyond just Numba cfunc.
Hi @gartnera.
We wanted to let you know that we think that this feature will be very useful to our users and decided to start working on this. You can take a look at the work in progress in this PR, which is based in your original proposal. If you have any comments or suggestions we'd be very interested to hear.
I've hacked together a quick prototype which demonstrates how you could use numba cfunc to process data.
numba is a jit to native compiler for python. The cfunc feature allows you to compile a function, get a pointer, and share that pointer with c/c++.
Here's an overview:
Change the function definition and see how min/max/std changes.
gartnera/dali-numba-plugin
I plan on developing this further, but I'm curious if this would be something you'd be interested in merging into the DALI repo when it was more mature.