Closed nth-BYTE closed 1 month ago
Thanks for your contribution and we appreciate it a lot. :rocket::rocket:
Provide 10%+ performance boost logging.h增加静态分支预测,优化deque和desc的逻辑
modified: core/logging.h modified: core/tensor.cpp modified: core/tensor.h modified: core/type.cpp modified: core/type.h modified: kernels/ball_query/ball_query.cpp modified: kernels/sparse_conv/get_indice_pairs/normal_get_indice_pairs.cpp
If you want to know how to do operator testing, you can see GTest-User-Guide-zh.
For static threshold standard details, see: MLU-OPS™ Accuracy Acceptance Standard.
If you have checked the following items, please tick the relevant box.
Test Point-1: When a new operator is submitted, the test points are given and the test results are stated. Acceptance Standard: Normal error.
When a new operator is submitted, the test points are given and the test results are stated
Normal error
Please fill your test results(Error Message) in here, ...
Test Point-2: Whether illegal parameters are passed. Acceptance Standard: Normal error.
Whether illegal parameters are passed
Test results...
See MLU-OPS™ Performance Acceptance Standard for details.
Platform:MLU370
Note: Google Test filter = *ball_query* [==========] Running 2 test cases from 1 test suite. [----------] Global test environment set-up. [2024-6-4 15:42:55] [MLUOP] [Warning]:mluOpInternalGetCommitId not found, use fallback method [2024-6-4 15:42:55] [MLUOP] [Warning]:mluOpInternalGetBranchInfo not found, use fallback method [date ]: 2024_06_04_15_42_55 [mluop_version ]: 1.2.0 [mlu_platform ]: MLU370-X4[mtp_372.42] [job_limit ]: [cluster_limit ]: [commit_id ]: commit b9489c048f8515ec4ca8c4a012b17aa77498d6a9 [mluop_branch ]: nl18421 [driver_version ]: 5.10.33 [cnrt_version ]: 6.11.0 [ip ]: 172.18.0.1 [repeat_count ]: 1 [----------] 2 tests from ball_query/TestSuite [ RUN ] ball_query/TestSuite.mluOp/0 [MLU Hardware Time ]: 80 (us) [MLU Interface Time ]: 6856.14 (us) [MLU IO Efficiency ]: 0.0476667 [MLU Compute Efficiency ]: 0.128 [MLU Workspace Size ]: -1 (Bytes) [MLU Kernel Name(s) ]: {} [MLU TheoryOps ]: 1.04858e+07 (Ops) [MLU TheoryIOs ]: 1.17146e+06 (Bytes) [MLU ComputeForce ]: 1.024e+12 (op/s) [MLU IoBandWidth ]: 307.2 (GB/s) [GPU Hardware Time ]: -1 (us) [GPU IO Efficiency ]: -1 [GPU Compute Efficiency ]: -1 [GPU Workspace Size ]: -1 (Bytes) [Diffs]: [output] DIFF1: 0.000000e+00 DIFF2: 0.000000e+00 [^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/ball_query/test_case/case_0.prototxt [ OK ] ball_query/TestSuite.mluOp/0 (98 ms) [ RUN ] ball_query/TestSuite.mluOp/1 [MLU Hardware Time ]: 12 (us) [MLU Interface Time ]: 8.751 (us) [MLU IO Efficiency ]: 0.000434028 [MLU Compute Efficiency ]: 0.000416667 [MLU Workspace Size ]: -1 (Bytes) [MLU Kernel Name(s) ]: {} [MLU TheoryOps ]: 10240 (Ops) [MLU TheoryIOs ]: 1600 (Bytes) [MLU ComputeForce ]: 2.048e+12 (op/s) [MLU IoBandWidth ]: 307.2 (GB/s) [GPU Hardware Time ]: -1 (us) [GPU IO Efficiency ]: -1 [GPU Compute Efficiency ]: -1 [GPU Workspace Size ]: -1 (Bytes) [Diffs]: [output] DIFF1: 0.000000e+00 DIFF2: 0.000000e+00 [^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/ball_query/test_case/case_1.prototxt [ OK ] ball_query/TestSuite.mluOp/1 (18 ms) [----------] 2 tests from ball_query/TestSuite (116 ms total) [----------] Global test environment tear-down [ SUMMARY ] Total 2 cases of 1 op(s). ALL PASSED. [==========] 2 test cases from 1 test suite ran. (23651 ms total) [ PASSED ] 2 test cases.
Note: Google Test filter = *get_indice_pairs* [==========] Running 1 test case from 1 test suite. [----------] Global test environment set-up. [2024-6-4 16:8:31] [MLUOP] [Warning]:mluOpInternalGetCommitId not found, use fallback method [2024-6-4 16:8:31] [MLUOP] [Warning]:mluOpInternalGetBranchInfo not found, use fallback method [date ]: 2024_06_04_16_08_31 [mluop_version ]: 1.2.0 [mlu_platform ]: MLU370-X4[mtp_372.42] [job_limit ]: [cluster_limit ]: [commit_id ]: commit b9489c048f8515ec4ca8c4a012b17aa77498d6a9 [mluop_branch ]: nl18421 [driver_version ]: 5.10.33 [cnrt_version ]: 6.11.0 [ip ]: 172.18.0.1 [repeat_count ]: 1 [----------] 1 test from get_indice_pairs/TestSuite [ RUN ] get_indice_pairs/TestSuite.mluOp/0 [MLU Hardware Time ]: 6060 (us) [MLU Interface Time ]: 57758.9 (us) [MLU IO Efficiency ]: 0.0149645 [MLU Compute Efficiency ]: 0.00111015 [MLU Workspace Size ]: 5.50499e+07 (Bytes) [MLU Kernel Name(s) ]: {} [MLU TheoryOps ]: 6.88899e+06 (Ops) [MLU TheoryIOs ]: 2.78584e+07 (Bytes) [MLU ComputeForce ]: 1.024e+12 (op/s) [MLU IoBandWidth ]: 307.2 (GB/s) [GPU Hardware Time ]: -1 (us) [GPU IO Efficiency ]: -1 [GPU Compute Efficiency ]: -1 [GPU Workspace Size ]: -1 (Bytes) [Diffs]: [output1] DIFF3: 0.000000e+00 [output2] DIFF3: 0.000000e+00 [output3] DIFF3: 0.000000e+00 [^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/get_indice_pairs/test_cases/case_1.prototxt [ OK ] get_indice_pairs/TestSuite.mluOp/0 (604 ms) [----------] 1 test from get_indice_pairs/TestSuite (604 ms total) [----------] Global test environment tear-down [ SUMMARY ] Total 1 cases of 1 op(s). ALL PASSED. [==========] 1 test case from 1 test suite ran. (4670 ms total) [ PASSED ] 1 test case.
Platform:MLU590
Note: Google Test filter = *ball_query* [==========] Running 2 test cases from 1 test suite. [----------] Global test environment set-up. [2024-9-20 16:21:55] [MLUOP] [Warning]:mluOpInternalGetCommitId not found, use fallback method [2024-9-20 16:21:55] [MLUOP] [Warning]:mluOpInternalGetBranchInfo not found, use fallback method sh: 1: ifconfig: not found [date ]: 2024_09_20_16_21_55 [mluop_version ]: 1.3.0 [mlu_platform ]: MLU590-H8 [job_limit ]: [cluster_limit ]: [commit_id ]: commit 6f20e9dd8cc95b80d5befb8e4632aeff62e26285 [mluop_branch ]: * nl18421_3 [driver_version ]: 6.2.3 [cnrt_version ]: 6.13.0 [ip ]: [repeat_count ]: 1 [----------] 2 tests from ball_query/TestSuite [ RUN ] ball_query/TestSuite.mluOp/0 [MLU Hardware Time ]: 87 (us) [MLU Interface Time ]: 47.404 (us) [MLU IO Efficiency ]: 0.00657471 [MLU Compute Efficiency ]: 0.0523116 [MLU Workspace Size ]: -1 (Bytes) [MLU Kernel Name(s) ]: {} [MLU TheoryOps ]: 1.04858e+07 (Ops) [MLU TheoryIOs ]: 1.17146e+06 (Bytes) [MLU ComputeForce ]: 2.304e+12 (op/s) [MLU IoBandWidth ]: 2048 (GB/s) [GPU Hardware Time ]: -1 (us) [GPU IO Efficiency ]: -1 [GPU Compute Efficiency ]: -1 [GPU Workspace Size ]: -1 (Bytes) [Diffs]: [output] DIFF1: 0.000000e+00 DIFF2: 0.000000e+00 [^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/ball_query/test_case/case_0.prototxt [ OK ] ball_query/TestSuite.mluOp/0 (19 ms) [ RUN ] ball_query/TestSuite.mluOp/1 [MLU Hardware Time ]: 47 (us) [MLU Interface Time ]: 6.253 (us) [MLU IO Efficiency ]: 1.66223e-05 [MLU Compute Efficiency ]: 4.72813e-05 [MLU Workspace Size ]: -1 (Bytes) [MLU Kernel Name(s) ]: {} [MLU TheoryOps ]: 10240 (Ops) [MLU TheoryIOs ]: 1600 (Bytes) [MLU ComputeForce ]: 4.608e+12 (op/s) [MLU IoBandWidth ]: 2048 (GB/s) [GPU Hardware Time ]: -1 (us) [GPU IO Efficiency ]: -1 [GPU Compute Efficiency ]: -1 [GPU Workspace Size ]: -1 (Bytes) [Diffs]: [output] DIFF1: 0.000000e+00 DIFF2: 0.000000e+00 [^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/ball_query/test_case/case_1.prototxt [ OK ] ball_query/TestSuite.mluOp/1 (17 ms) [----------] 2 tests from ball_query/TestSuite (36 ms total) [----------] Global test environment tear-down [ SUMMARY ] Total 2 cases of 1 op(s). ALL PASSED. [==========] 2 test cases from 1 test suite ran. (3289 ms total) [ PASSED ] 2 test cases.
Please give a brief overview here, if you want to note and summarize the content.
jira:http://jira.cambricon.com/browse/CNNLCORE-20578
Modification 列表里多了个 logging.h,"Files changed" 标签页里没有这个文件。
jira 链接里的 id 不是 18421,应该是 20578。
Thanks for your contribution and we appreciate it a lot. :rocket::rocket:
1. Motivation
Provide 10%+ performance boost logging.h增加静态分支预测,优化deque和desc的逻辑
2. Modification
modified: core/logging.h modified: core/tensor.cpp modified: core/tensor.h modified: core/type.cpp modified: core/type.h modified: kernels/ball_query/ball_query.cpp modified: kernels/sparse_conv/get_indice_pairs/normal_get_indice_pairs.cpp
3. Test Report
If you want to know how to do operator testing, you can see GTest-User-Guide-zh.
3.1 Modification Details
3.1.1 Accuracy Acceptance Standard
For static threshold standard details, see: MLU-OPS™ Accuracy Acceptance Standard.
3.1.2 Operator Scheme checklist
3.2 Accuracy Test
3.2.1 Accuracy Test
If you have checked the following items, please tick the relevant box.
3.2.2 Parameter Check
Test Point-1:
When a new operator is submitted, the test points are given and the test results are stated
. Acceptance Standard:Normal error
.Test Point-2:
Whether illegal parameters are passed
. Acceptance Standard:Normal error
.3.3 Performance Test
See MLU-OPS™ Performance Acceptance Standard for details.
Platform:MLU370
Platform:MLU590
3.4 Summary Analysis
Please give a brief overview here, if you want to note and summarize the content.