Cambricon / mlu-ops

Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
MIT License
103 stars 102 forks source link

[Fix](mlu-ops) fix mem leak in core-mode #1094

Closed nth-BYTE closed 2 weeks ago

nth-BYTE commented 1 month ago

Thanks for your contribution and we appreciate it a lot. :rocket::rocket:

1. Motivation

Please describe your motivation and the goal you want to achieve through this pull request.

2. Modification

Please briefly describe what modification is made in this pull request, and indicate where to make the modification.

Are new test cases added? If so, please post the corresponding generator-PR link here.

3. Test Report

If you want to know how to do operator testing, you can see GTest-User-Guide-zh.

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

For static threshold standard details, see: MLU-OPS™ Accuracy Acceptance Standard.

3.1.2 Operator Scheme checklist

3.2 Accuracy Test

3.2.1 Accuracy Test

If you have checked the following items, please tick the relevant box.

3.2.2 Parameter Check

Test Point-1: When a new operator is submitted, the test points are given and the test results are stated. Acceptance Standard: Normal error.

Please fill your test results(Error Message) in here, ...

Test Point-2: Whether illegal parameters are passed. Acceptance Standard: Normal error.

Test results...

3.3 Performance Test

See MLU-OPS™ Performance Acceptance Standard for details.

Platform:MLU370

# The test results should contain Op name, Shape, Data type,  
#   MLU Hardware Time(us), MLU Interface Time(us), MLU IO Efficiency, 
#   MLU Compute Efficiency, and Mlu Workspace Size(Bytes)
# 
# for example:
#
# ----------- case0 -----------
# case0
# [Op name                ]: abs
# [Shape                  ]: input.shape=[1024,1024,3,4], output.shape=[1024,1024,3,4]
# [Data type]             ]: float32
# [MLU Hardware Time      ]: 15728 (us)
# [MLU Interface Time     ]: 369.008 (us)
# [MLU IO Efficiency      ]: 0.23275
# [MLU Compute Efficiency ]: 0.5
# [Mlu Workspace Size     ]: -1 (Bytes)
# 
# ----------- case1 -----------
# ...

Platform:MLU590

ote: Google Test filter = *abs*
[==========] Running 6 test cases from 1 test suite.
[----------] Global test environment set-up.
[2024-10-10 07:47:43.914939][MLUOP][WARNING][17377][Card:5]: mluOpInternalGetCommitId not found, use fallback method
[2024-10-10 07:47:43.931486][MLUOP][WARNING][17377][Card:5]: mluOpInternalGetBranchInfo not found, use fallback method
sh: 1: ifconfig: not found
[date                   ]: 2024_10_10_15_47_43
[mluop_version           ]: 1.3.0
[mlu_platform           ]: MLU590-H8
[job_limit              ]: 
[cluster_limit          ]: 
[commit_id              ]: commit 1d2f1145a99c28415cd12df5ead4b50a7f6c0c3c
[mluop_branch            ]: * nl20715
[driver_version         ]: 6.2.4
[cnrt_version           ]: 6.13.0
[ip                     ]: 
[repeat_count           ]: 1
[----------] 6 tests from abs/TestSuite
[ RUN      ] abs/TestSuite.mluOp/0
[MLU Hardware Time      ]: 2 (us)
[MLU Interface Time     ]: 11119.6 (us)
[MLU IO Efficiency      ]: 0.0195312
[MLU Compute Efficiency ]: 0.00217014
[MLU Workspace Size     ]: -1 (Bytes)
[MLU Kernel Name(s)     ]: {}
[MLU TheoryOps          ]: 10000 (Ops)
[MLU TheoryIOs          ]: 80000 (Bytes)
[MLU ComputeForce       ]: 2.304e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 0.000000e+00
DIFF2: 0.000000e+00
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/abs/test_case/case_0.prototxt
[       OK ] abs/TestSuite.mluOp/0 (18 ms)
[ RUN      ] abs/TestSuite.mluOp/1
[MLU Hardware Time      ]: 2 (us)
[MLU Interface Time     ]: 35.496 (us)
[MLU IO Efficiency      ]: 0.0195312
[MLU Compute Efficiency ]: 0.00217014
[MLU Workspace Size     ]: -1 (Bytes)
[MLU Kernel Name(s)     ]: {}
[MLU TheoryOps          ]: 10000 (Ops)
[MLU TheoryIOs          ]: 80000 (Bytes)
[MLU ComputeForce       ]: 2.304e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 0.000000e+00
DIFF2: 0.000000e+00
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/abs/test_case/case_1_with_stride.prototxt
[       OK ] abs/TestSuite.mluOp/1 (2 ms)
[ RUN      ] abs/TestSuite.mluOp/2
[MLU Hardware Time      ]: 295 (us)
[MLU Interface Time     ]: 22.637 (us)
[MLU IO Efficiency      ]: 0.000132415
[MLU Compute Efficiency ]: 9.4162e-08
[MLU Workspace Size     ]: -1 (Bytes)
[MLU Kernel Name(s)     ]: {}
[MLU TheoryOps          ]: 64 (Ops)
[MLU TheoryIOs          ]: 80000 (Bytes)
[MLU ComputeForce       ]: 2.304e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 0.000000e+00
DIFF2: 0.000000e+00
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/abs/test_case/case_2_with_stride.prototxt
[       OK ] abs/TestSuite.mluOp/2 (2 ms)
[ RUN      ] abs/TestSuite.mluOp/3
[MLU Hardware Time      ]: 17 (us)
[MLU Interface Time     ]: 19.647 (us)
[MLU IO Efficiency      ]: 0.00229779
[MLU Compute Efficiency ]: 0.00025531
[MLU Workspace Size     ]: -1 (Bytes)
[MLU Kernel Name(s)     ]: {}
[MLU TheoryOps          ]: 10000 (Ops)
[MLU TheoryIOs          ]: 80000 (Bytes)
[MLU ComputeForce       ]: 2.304e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 0.000000e+00
DIFF2: 0.000000e+00
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/abs/test_case/case_3_with_stride.prototxt
[       OK ] abs/TestSuite.mluOp/3 (1 ms)
[ RUN      ] abs/TestSuite.mluOp/4
[MLU Hardware Time      ]: 4 (us)
[MLU Interface Time     ]: 6.009 (us)
[MLU IO Efficiency      ]: 0.0146484
[MLU Compute Efficiency ]: 0.00108507
[MLU Workspace Size     ]: -1 (Bytes)
[MLU Kernel Name(s)     ]: {}
[MLU TheoryOps          ]: 10000 (Ops)
[MLU TheoryIOs          ]: 120000 (Bytes)
[MLU ComputeForce       ]: 2.304e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 1.710338e-08
DIFF2: 3.779302e-08
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/abs/test_case/case_4_complex.prototxt
[       OK ] abs/TestSuite.mluOp/4 (2 ms)
[ RUN      ] abs/TestSuite.mluOp/5
[MLU Hardware Time      ]: 3 (us)
[MLU Interface Time     ]: 16.833 (us)
[MLU IO Efficiency      ]: 0.0195312
[MLU Compute Efficiency ]: 0.00144676
[MLU Workspace Size     ]: -1 (Bytes)
[MLU Kernel Name(s)     ]: {}
[MLU TheoryOps          ]: 10000 (Ops)
[MLU TheoryIOs          ]: 120000 (Bytes)
[MLU ComputeForce       ]: 2.304e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 1.710338e-08
DIFF2: 3.779302e-08
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/abs/test_case/case_5_complex_with_stride.prototxt
[       OK ] abs/TestSuite.mluOp/5 (1 ms)
[----------] 6 tests from abs/TestSuite (26 ms total)

[----------] Global test environment tear-down
[ SUMMARY  ] Total 6 cases of 1 op(s).
ALL PASSED.
[==========] 6 test cases from 1 test suite ran. (9547 ms total)
[  PASSED  ] 6 test cases.

3.4 Summary Analysis

Please give a brief overview here, if you want to note and summarize the content.