KhronosGroup / NNEF-Tools

The NNEF Tools repository contains tools to generate and consume NNEF documents
https://www.khronos.org/nnef
222 stars 57 forks source link

dilation default empty array #36

Closed jnorwood closed 6 years ago

jnorwood commented 6 years ago

The descriptions of dilation in the spec indicate that they should be strictly positive, and should be 1 for the relevant extents to signify no dilation. However, several prototypes have been designated in the spec with an empty array as the default value. The flattened graph isn't expanding this to fill in ones to match the extents. Is the intention that an implementation should substitute a dilation Extent with the same dimensions as the shape, and fill it with all 1s ?

`The parameters size, padding, stride and dilation must contain one entry for each relevant dimension (except where stated otherwise). The unspecified trailing values are considered to be 1, ...

Items in dilation must be strictly positive.`

gyenesvi commented 6 years ago

Indeed, this is a bit vague in the provisional spec, and the final spec will be more clear on this. The dilation array should contain one item for each relevant dimension, or should be empty, in which case it is understood to be 1 in all dimensions. When it's empty, the implementation should substitute the 1s. When it's not empty, it is not allowed to have an array of length that is different from the number of dimensions dictated by the operation. So for a 2d convolution, 2 items are necessary.

The default value is the empty array so that it is the same notation for all dimensionalities, so for example for 1d, 2d and 3d convolutions. The number of dimensions of the operation is decided based on the rank of the input tensor (4d input implies 2d conv). In case of pooling, 4d input tensor implies 4d pooling (it can be performed in all dimensions, as opposed to conv), so 4 dilation values are required.

The rank specification of tensors is also something that's going to be more clear in the final spec.

jnorwood commented 6 years ago

ok, thanks. I'll close this. Just a comment, though. The C code

int xx[4]= {}; // does not compile on msvc , but I see notes that it may result in all 0s array on gcc
int xx[4] = {1}; // does compile on msvc, but only the first member is set to 1 

So, at least for a C implementation, the correspondence between the flattened nnef graph and the code is going to be a bit strained.

It would be more help if the flattened graph included the default dilation values explicitly, which would be consistent with what is done for shapes, padding, size and strides. Is there some reason for handling dilation defaults differently?

gyenesvi commented 6 years ago

The default values should be empty arrays for padding, stride and dilation, so it is not meant to be treated differently as strides and padding (size can never be empty). For stride and dilation, the empty array means all 1s, for padding, it means auto-padding (basically like same-padding in TensorFlow).

However, there are cases in the compound operations in the provisional spec, where arrays of some items were substituted for those arguments, but in a kind of inconsistent manner (not matching the rank of the input). That may have resulted in non-empty arrays when flattening. This will be changed in the final spec and the parser will work accordingly.

Of course it is possible to substitute the appropriate items when flattening the graph, although I am not sure it is worth if a non-empty array is not mandated by the spec, because then a flat parser still needs to be prepared for empty arrays for those parameters.

Basically, the empty array is just a default value so that the parameter does not need to be provided, in which case the operation behaves as if it did not even have such a parameter, for example no striding or no dilation, and padding such that the result is the same-size as the input.

jnorwood commented 6 years ago

I agree that it is good to have defaults if the user is manually editing the code, but if you are doing code conversions, or doing flattening of a composite via the parser output, it would be useful to have all the defaults output explicitly. Otherwise the code to keep track of the defaults and reinsert their values will have to be recreated by each user.

I'm showing some lines from the validator output blow. It appears to me that padd, stride, groups and border default values are being output explicitly, which is helpful to me, while the dilation default value is not being output. So, this is what I mean by the defaults not being consistently explicitly output. Perhaps it is just the case that the original code specified the other parameters, and so the translator passed them through explicitly. Anyway, my point is that it would be very useful to have the defaults output explicitly into whatever is the flattened persisted state ... otherwise the backend has to be too much smarter.

This is from an nnef conversion of the caffe model of the squeezenet v1.1 code at https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1

    fire9_expand1x1 = conv(fire9_squeeze1x1_tmp, fire9_expand1x1_filter, fire9_expand1x1_bias, border = 'constant', padding = [(0,0),(0,0)], stride = [1,1], dilation = [], groups = 1)
    fire9_expand1x1_tmp = relu(fire9_expand1x1)
    fire9_expand3x3 = conv(fire9_squeeze1x1_tmp, fire9_expand3x3_filter, fire9_expand3x3_bias, border = 'constant', padding = [(1,1),(1,1)], stride = [1,1], dilation = [], groups = 1)
    fire9_expand3x3_tmp = relu(fire9_expand3x3)
    fire9_concat = concat([fire9_expand1x1_tmp,fire9_expand3x3_tmp], axis = 1)
    conv10 = conv(fire9_concat, conv10_filter, conv10_bias, border = 'constant', padding = [(0,0),(0,0)], stride = [1,1], dilation = [], groups = 1)
    conv10_tmp = relu(conv10)
    pool10 = avg_pool(conv10_tmp, size = [1,1,0,0], border = 'constant', padding = [(0,0),(0,0),(0,-14),(0,-14)], stride = [1,1,1,1], dilation = [])
    prob = softmax(pool10, axes = [1])
jnorwood commented 6 years ago

I should explain further that my point of view is for consuming the flattened output of the validator so that I can reconstruct network graphs that have been created in caffe or tensorflow. In the case of trying to construct a graph from the flattened output, it would be very helpful to have explicitly/fully expanded parameters coming from the central exchange utilities, rather than each consuming end being responsible for handling default values.

gyenesvi commented 6 years ago

I think the above inconsistency is the result of the exporter filling some parameters, but not the dilation.

Anyway, I understand your idea of a 'normalized' flat description, where every default value has been substituted already, and the backend's job is simpler. I did consider introducing such a notion, there was one problematic point with it. What happens if we introduce a new parameter to an operation in a newer version of the spec? Currently, we can just give it a default value, resulting in a backward compatible behavior, that is, if it is not filled, it will work as before. But that mechanism relies on the default being substituted automatically, the 'normalized' description would become invalid once a new parameter is added (that is, a newer parser could not read an older file, because it would be missing a parameter value).

Let me know if you have some suggestions to fix this problem!

By the way, the defaults are something a parser can fill in, it is independent of how the NNEF document is consumed. Right now, the parser actually fills in the defaults, the problem may be that the default of the dilation is the empty array.

jnorwood commented 6 years ago

If you want to support backward compatibility, you have the version number in the graph version line. I think people expect that apis can change at this stage, and perhaps for major version number changes. If you want to reduce the effort required for others to track the changes, you could extract the api definitions out of the spec into a single text file that has only the api, and put that in github. That can be diff'd against the prior version to see the changes easily, rather than having to wade through all the unrelated updates to the spec manual.

jnorwood commented 6 years ago

One other thing that could help those consuming the flattened output is if the parameters are guaranteed to be output in the order they are listed in the prototypes, or some other specified fixed order. In my case, I'm generating code using c99 struct designated intializers which will accept the parameters in any order. The clang++ compiler will also accept struct designated initializers in any order, compatible with c99. However g++ doesn't support out of order, and I see the c++20 working spec also does not accept out of order. So, if consuming in c++, this could mean that the consuming end would have to rearrange struct parameter list initialization in the generated code if NNEF does not guarantee a consistent output order. I'm not aware of anything in the current spec guaranteeing this.

gyenesvi commented 6 years ago

Honestly, I would not go as far as changing major version numbers when a new parameter is added to an operation. We would like to preserve that for really non-backward compatible changes, and use minor version increments for new features. One of the main reasons for using keyword based list of attributes in the invocation syntax was to allow for flexibility in the order and default values.

Another thing to note is that NNEF is not designed to be a deployment format, instead as an exchange format, and as such, it is not really optimized for fast and very easy consumption, instead to retain all information for conversion. Of course, if it is possible to have easy consumption without sacrificing flexibility, we should do it. Alternatively, I could imagine an extension which allows for super-easy consumption via requiring a 'normalized' description (fixed order of params without defaults)

Furthermore, I do not really understand why is it beneficial if the syntax is optimized for conversion to C++. Such a conversion is an offline process I guess, and it is not really performance critical. It is quite easy to convert into an order you wish to (actually, the flattener outputs parameters in the declaration order, which is possible because the declaration (prototype) is passed to the callback). Of course, your implementation may be quite different. However, when I thought about consumers, I assumed they'd have to do some consistency checking anyway, which is almost as much work as substituting in default values for example. Are you working on a single pass solution? Are you building on the existing parser, or writing your own from the ground up?

jnorwood commented 6 years ago

If you change the prototype, the consuming end needs to know, since the producing end of the exchange format may be using the new feature/option. So, I would vote for emitting the full parameter list. At least the consumer will then have a clear indication of the change.

If you emit flattened output in a consistent order from the central exchange point (in the prototype order, you are saying), then that potentially reduces work that would have to be done by every consumer application. But, I think you need to state somewhere that flattening to the prototype order is the policy so that the consumer can count on it.

I'm replacing the parser main with code that generates from the flattened description. It reads all the shapes info at the end callback, which it uses to insert allocation and initialization statements at the start of the generated code, so I guess not strictly a single pass.

gyenesvi commented 6 years ago

Of course if the prototype changes in the spec, the version number would change, but not the major version, only the minor.

I agree that fixing the order would reduce the work on the consumer side, but it would also limit the format and maybe make life of other tools harder (such as converters and graph optimizers). That's why maybe a possible solution would be to use an indicator in the beginning of the document (usually an 'extension' in Khronos standards) to indicate that prototype order is to be used without default values, and then if this extension flag is present, the consumer can count on that.

Typical consumers would perform some possibly complex graph optimizations, so single pass is usually not possible anyway. The design philosophy targets use cases when a consumer of NNEF converts it to a format optimized for a single target inference engine in an offline manner. Afterwards, loading the already optimized graph from the native format can be fast on the target inference engine.