Open chunseoklee opened 4 years ago
cc @ragmani
One item for upstream is CLReductionOperation. Path : compute/ARMComputeEx/src/runtime/CL/functions/CLReduceOperation.cpp
Upstream version of ARMCL doesn't consider multiple axis values to this operation so we cannot inference in-house model(hand gesture) on GPU without modification. So I planned to fix this operation and upstream it but I found out that ONE have fixed it already so that multiple axis values are handled.
I'm not sure the update can manage all operations correctly such as ARG_IDX_MAX/MIN, MEAN_SUM, PROD, SUM_SQUARE, SUM, MIN and MAX handled by CLReductionOperation but if this update could be upstreamed to ARMCL mainline then it would be very helpful for reducing maintenance cost.
Thanks.
This is simple explanation about ArmComputeEx
kernels
Layer | The reason why it was introduced (v20.05) | Is the problem resolved on upstream of Compute Library? | Additional explanation | Author |
---|---|---|---|---|
CLArgMinMaxLayerEx |
To fix setting output type incorrectly | X | @dr-venkman | |
CLBinaryLogicalOp , NEBinaryLogicalOp |
To support broadcasting. CLLogicalAnd and NELogicalAnd don't support broadcasting |
X | sec@prasannar | |
CLCastBool , NECastBool |
To fix different results compared to TFLite interpreter |
X | TFLite interpreter converts bool type to 1, but CLCast and NECast of ComputeLibrary converts bool type to 255 |
@ragmani |
CLEmbeddingLookup , CLHashtableLookup , NEEmbeddingLookup , NEHashtableLookup |
There is no kernel for these operations | X | I don't know where these operations are needed | @ragmani |
CLFullyConnectedLayerEx , NEFullyConnectedLayerEx |
CLFullyConnectedLayer and NEFullyConnectedLayer don't support weights as inputs. |
X | CLFullyConnectedLayerEx and NEFullyConnectedLayerEx just disable caching weights |
@ragmani |
CLFullyConnectedHybridLayer , NEFullyConnectedHybridLayer |
To support hybrid quantization | X | @ragmani | |
CLGatherEx , NEGatherEx |
To support 2D/3D indices | X | CLGatherEx : @jyoungyun , NEGatherEx : @ragmani | |
CLInstanceNormalizationLayerKernelEx , NEInstanceNormalizationLayerKernelEx |
To support InstanceNormalization operation | O | @ragmani | |
CLNeg |
To support int32 type. CLElementWiseUnaryLayer does not support int32 type |
X | sec@prasannar | |
CLOneHot , NEOneHot |
To support OneHot operation. | X | @ragmani | |
CLReduceOperation , NEReduceOperation , NEReduceSum |
To support multiple axis values | X | @ragmani | |
CLSplitVEx |
To support SplitV operation. | X | @dr-venkman | |
CLTopKV2 |
To support TopKV2 operation. | X | This is not CL kernel | - |
CLTransposeConvLayer , NETransposeConvLayer |
To support invalid padding. | X | CLDeconvolutionLayer and NEDeconvolutionLayer does not support invalid padding. For example, negative padding |
@hseok-oh, @ragmani |
NEActivationLayerEx |
To fix a bug in Logistic operation | O(v20.05) | NEActivationLayer can generate produce erroneous results. It is caused by vexpq_f32() . The neon function returns a value outside of the limit of representation in float as NaN instead of INF , and then the result of this op will be errors due to the NaN |
@ragmani |
CLReduceOperationKernel |
To fix a bug in ReduceSum operation | - | @ragmani |
@ragmani Thanks for sharing this.
As of now, ArmComputeEx of ONE has a big diff compared to mainline, and it will make more big diff as time goes on. Which in turn, it will result in the high update cost. So we may need to reduce the diff by settling them down on mainline from now.
As of now, ArmComputeEx of ONE has a big diff compared to mainline, and it will make more big diff as time goes on. Which in turn, it will result in the high update cost. So we may need to reduce the diff by settling them down on mainline from now.
@daeinki The reason of that ArmComputeEx of ONE has big diffs is because ArmComputeEx was applied clang-format. I'm not sure but how about trying to apply https://github.com/ARM-software/ComputeLibrary/blob/master/.clang-format. It may reduce the diffs. Anyway I will try to contribute for merging some kernels to upstream repo. But I'm not sure if I succeed to merge changes to upstream repository,
As of now, ArmComputeEx of ONE has a big diff compared to mainline, and it will make more big diff as time goes on. Which in turn, it will result in the high update cost. So we may need to reduce the diff by settling them down on mainline from now.
@daeinki The reason of that ArmComputeEx of ONE has big diffs is because ArmComputeEx was applied clang-format. I'm not sure but how about trying to apply https://github.com/ARM-software/ComputeLibrary/blob/master/.clang-format. It may reduce the diffs. Anyway I will try to contribute for merging some kernels to upstream repo. But I'm not sure if I succeed to merge changes to upstream repository,
What I said is not that just coding style difference in deed but is that the changes in ArmComputeEx couldn't be reused for upstream. Actually, I looked into a reduction layer of ArmComputeEx and I think you may need to re-implement this layer from scratch on top of mainline ARMCL because both have totally different code not coding style.
Just sharing my upstream work, I already started the upstream to several ops support for ARMNN, and int32 and int64 ArgMinMax op support has been merged to ARMNN mainline as of now. And, I'm preparing for other patch set which is to add a new CL layer support to ARMCL.
Upstream wouldn't be easy and will take much time but would be needed. TPL recommend strongly to upstream changes as possible because in past, we decided to clone several open source frameworks - Enlightenment, SystemD, and so on - due to upstream overhead. We didn't upstream changes just in time. This is a big one of stains Tizen has.
@daeinki Thanks for your sharing. It was good information to upstream. Can I get the reason why you think I may need to re-implement reduction layer? I would like to re-implement the shortcomings of that layer. And could you share me about the upstream process and upstream overhead of Compute Library based on your experience? It would reduce overhead occurred by that I try to upstream.
@ragmani
The reason I think is that you touched on mainline code too much so I pretty sure that as-is will be rejected. If you want to upstream as-is then you also have to post another patch for ARMNN because primary user of ARMCL is ARMNN because with as-is patch, ARMNN - primary user - never work. Even for upstream, you would need to discuss this patch with both sides - ARMCL and ARMNN engineers. Simply saying, your patch broken the compatibility.
I'd recommend you to keep original format (function parameters, class members) and coding style - as possible as you can when you make a patch for upstream.
And upstream process for ARMCL is pretty simple because they use gerrit review system which is same as Tizen. First of all, sign up you to the gerrit review system - https://review.mlplatform.org.
And just git clone https://git.mlplatform.org/ml/ComputeLibrary.git, make a new patch, and post it to the gerrit review system. To post a new patch, I use below commands,
As for review, I will back you up.
Please feel free to request me anytime if you need some help.
@daeinki Thanks for your information. I will make a simple patch first of all after completing our process.
How about make PRs of newly added ArmComputeEx kernels to upstream Compute Library ? IDK much about today's ArmComputeEx kernels. Let's discuss here.