Cambricon / mlu-ops

Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
MIT License
103 stars 102 forks source link

[Feature](mlu-ops): optimization for descritpor #1083

Closed nth-BYTE closed 1 month ago

nth-BYTE commented 1 month ago

Thanks for your contribution and we appreciate it a lot. :rocket::rocket:

1. Motivation

Provide 10%+ performance boost logging.h增加静态分支预测,优化deque和desc的逻辑

2. Modification

modified: core/logging.h modified: core/tensor.cpp modified: core/tensor.h modified: core/type.cpp modified: core/type.h modified: kernels/ball_query/ball_query.cpp modified: kernels/sparse_conv/get_indice_pairs/normal_get_indice_pairs.cpp

3. Test Report

If you want to know how to do operator testing, you can see GTest-User-Guide-zh.

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

For static threshold standard details, see: MLU-OPS™ Accuracy Acceptance Standard.

3.1.2 Operator Scheme checklist

3.2 Accuracy Test

3.2.1 Accuracy Test

If you have checked the following items, please tick the relevant box.

3.2.2 Parameter Check

Test Point-1: When a new operator is submitted, the test points are given and the test results are stated. Acceptance Standard: Normal error.

Please fill your test results(Error Message) in here, ...

Test Point-2: Whether illegal parameters are passed. Acceptance Standard: Normal error.

Test results...

3.3 Performance Test

See MLU-OPS™ Performance Acceptance Standard for details.

Platform:MLU370

Note: Google Test filter = *ball_query*
[==========] Running 2 test cases from 1 test suite.
[----------] Global test environment set-up.
[2024-6-4 15:42:55] [MLUOP] [Warning]:mluOpInternalGetCommitId not found, use fallback method
[2024-6-4 15:42:55] [MLUOP] [Warning]:mluOpInternalGetBranchInfo not found, use fallback method
[date                   ]: 2024_06_04_15_42_55
[mluop_version           ]: 1.2.0
[mlu_platform           ]: MLU370-X4[mtp_372.42]
[job_limit              ]: 
[cluster_limit          ]: 
[commit_id              ]: commit b9489c048f8515ec4ca8c4a012b17aa77498d6a9
[mluop_branch            ]:   nl18421
[driver_version         ]: 5.10.33
[cnrt_version           ]: 6.11.0
[ip                     ]: 172.18.0.1
[repeat_count           ]: 1
[----------] 2 tests from ball_query/TestSuite
[ RUN      ] ball_query/TestSuite.mluOp/0
[MLU Hardware Time      ]: 80 (us)
[MLU Interface Time     ]: 6856.14 (us)
[MLU IO Efficiency      ]: 0.0476667
[MLU Compute Efficiency ]: 0.128
[MLU Workspace Size     ]: -1 (Bytes)
[MLU Kernel Name(s)     ]: {}
[MLU TheoryOps          ]: 1.04858e+07 (Ops)
[MLU TheoryIOs          ]: 1.17146e+06 (Bytes)
[MLU ComputeForce       ]: 1.024e+12 (op/s)
[MLU IoBandWidth        ]: 307.2 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 0.000000e+00
DIFF2: 0.000000e+00
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/ball_query/test_case/case_0.prototxt
[       OK ] ball_query/TestSuite.mluOp/0 (98 ms)
[ RUN      ] ball_query/TestSuite.mluOp/1
[MLU Hardware Time      ]: 12 (us)
[MLU Interface Time     ]: 8.751 (us)
[MLU IO Efficiency      ]: 0.000434028
[MLU Compute Efficiency ]: 0.000416667
[MLU Workspace Size     ]: -1 (Bytes)
[MLU Kernel Name(s)     ]: {}
[MLU TheoryOps          ]: 10240 (Ops)
[MLU TheoryIOs          ]: 1600 (Bytes)
[MLU ComputeForce       ]: 2.048e+12 (op/s)
[MLU IoBandWidth        ]: 307.2 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 0.000000e+00
DIFF2: 0.000000e+00
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/ball_query/test_case/case_1.prototxt
[       OK ] ball_query/TestSuite.mluOp/1 (18 ms)
[----------] 2 tests from ball_query/TestSuite (116 ms total)

[----------] Global test environment tear-down
[ SUMMARY  ] Total 2 cases of 1 op(s).
ALL PASSED.
[==========] 2 test cases from 1 test suite ran. (23651 ms total)
[  PASSED  ] 2 test cases.
Note: Google Test filter = *get_indice_pairs*
[==========] Running 1 test case from 1 test suite.
[----------] Global test environment set-up.
[2024-6-4 16:8:31] [MLUOP] [Warning]:mluOpInternalGetCommitId not found, use fallback method
[2024-6-4 16:8:31] [MLUOP] [Warning]:mluOpInternalGetBranchInfo not found, use fallback method
[date                   ]: 2024_06_04_16_08_31
[mluop_version           ]: 1.2.0
[mlu_platform           ]: MLU370-X4[mtp_372.42]
[job_limit              ]: 
[cluster_limit          ]: 
[commit_id              ]: commit b9489c048f8515ec4ca8c4a012b17aa77498d6a9
[mluop_branch            ]:   nl18421
[driver_version         ]: 5.10.33
[cnrt_version           ]: 6.11.0
[ip                     ]: 172.18.0.1
[repeat_count           ]: 1
[----------] 1 test from get_indice_pairs/TestSuite
[ RUN      ] get_indice_pairs/TestSuite.mluOp/0
[MLU Hardware Time      ]: 6060 (us)
[MLU Interface Time     ]: 57758.9 (us)
[MLU IO Efficiency      ]: 0.0149645
[MLU Compute Efficiency ]: 0.00111015
[MLU Workspace Size     ]: 5.50499e+07 (Bytes)
[MLU Kernel Name(s)     ]: {}
[MLU TheoryOps          ]: 6.88899e+06 (Ops)
[MLU TheoryIOs          ]: 2.78584e+07 (Bytes)
[MLU ComputeForce       ]: 1.024e+12 (op/s)
[MLU IoBandWidth        ]: 307.2 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF3: 0.000000e+00
[output2]
DIFF3: 0.000000e+00
[output3]
DIFF3: 0.000000e+00
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/get_indice_pairs/test_cases/case_1.prototxt
[       OK ] get_indice_pairs/TestSuite.mluOp/0 (604 ms)
[----------] 1 test from get_indice_pairs/TestSuite (604 ms total)

[----------] Global test environment tear-down
[ SUMMARY  ] Total 1 cases of 1 op(s).
ALL PASSED.
[==========] 1 test case from 1 test suite ran. (4670 ms total)
[  PASSED  ] 1 test case.

Platform:MLU590

Note: Google Test filter = *ball_query*
[==========] Running 2 test cases from 1 test suite.
[----------] Global test environment set-up.
[2024-9-20 16:21:55] [MLUOP] [Warning]:mluOpInternalGetCommitId not found, use fallback method
[2024-9-20 16:21:55] [MLUOP] [Warning]:mluOpInternalGetBranchInfo not found, use fallback method
sh: 1: ifconfig: not found
[date                   ]: 2024_09_20_16_21_55
[mluop_version           ]: 1.3.0
[mlu_platform           ]: MLU590-H8
[job_limit              ]: 
[cluster_limit          ]: 
[commit_id              ]: commit 6f20e9dd8cc95b80d5befb8e4632aeff62e26285
[mluop_branch            ]: * nl18421_3
[driver_version         ]: 6.2.3
[cnrt_version           ]: 6.13.0
[ip                     ]: 
[repeat_count           ]: 1
[----------] 2 tests from ball_query/TestSuite
[ RUN      ] ball_query/TestSuite.mluOp/0
[MLU Hardware Time      ]: 87 (us)
[MLU Interface Time     ]: 47.404 (us)
[MLU IO Efficiency      ]: 0.00657471
[MLU Compute Efficiency ]: 0.0523116
[MLU Workspace Size     ]: -1 (Bytes)
[MLU Kernel Name(s)     ]: {}
[MLU TheoryOps          ]: 1.04858e+07 (Ops)
[MLU TheoryIOs          ]: 1.17146e+06 (Bytes)
[MLU ComputeForce       ]: 2.304e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 0.000000e+00
DIFF2: 0.000000e+00
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/ball_query/test_case/case_0.prototxt
[       OK ] ball_query/TestSuite.mluOp/0 (19 ms)
[ RUN      ] ball_query/TestSuite.mluOp/1
[MLU Hardware Time      ]: 47 (us)
[MLU Interface Time     ]: 6.253 (us)
[MLU IO Efficiency      ]: 1.66223e-05
[MLU Compute Efficiency ]: 4.72813e-05
[MLU Workspace Size     ]: -1 (Bytes)
[MLU Kernel Name(s)     ]: {}
[MLU TheoryOps          ]: 10240 (Ops)
[MLU TheoryIOs          ]: 1600 (Bytes)
[MLU ComputeForce       ]: 4.608e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 0.000000e+00
DIFF2: 0.000000e+00
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/ball_query/test_case/case_1.prototxt
[       OK ] ball_query/TestSuite.mluOp/1 (17 ms)
[----------] 2 tests from ball_query/TestSuite (36 ms total)

[----------] Global test environment tear-down
[ SUMMARY  ] Total 2 cases of 1 op(s).
ALL PASSED.
[==========] 2 test cases from 1 test suite ran. (3289 ms total)
[  PASSED  ] 2 test cases.

3.4 Summary Analysis

Please give a brief overview here, if you want to note and summarize the content.

nth-BYTE commented 1 month ago

jira:http://jira.cambricon.com/browse/CNNLCORE-20578

zhouyue624 commented 1 month ago

Modification 列表里多了个 logging.h,"Files changed" 标签页里没有这个文件。

zhouyue624 commented 1 month ago

jira 链接里的 id 不是 18421,应该是 20578。