[Feature](mluOpLgamma) add new operator lgamma

Frankd35 commented 6 months ago

Thanks for your contribution and we appreciate it a lot. :rocket::rocket:

1. Motivation

add new operator lgamma

2. Modification

add implementation of lgamma

3. Test Report

not yet

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

For static threshold standard details, see: MLU-OPS™ Accuracy Acceptance Standard.

static threshold
- diff1
- [ ] float32 mlu diff1 <= 1e-5
- [ ] float32 mlu diff1 <= 3e-3
- [ ] float16 mlu diff1 <= 3e-3
- diff2
- [ ] float32 mlu diff2 <= 1e-5
- [ ] float32 mlu diff2 <= 3e-3
- [ ] float16 mlu diff2 <= 3e-3
- diff3
- [ ] mlu diff3 == 0
- [ ] mlu diff3_1 == 0
- [ ] mlu diff3_2 == 0
dynamic threshold
- [X] diff1: mlu diff1 <= max(baseline diff1 * 10, static threshold)
- [X] diff2: mlu diff2 <= max(baseline diff2 * 10, static threshold)
- [ ] diff3: mlu diff3 <= max(baseline diff3 * 10, static threshold)
- float32, threshold = 3e-3
- float16, threshold = 3e-3

3.1.2 Operator Scheme checklist

Supported hardware
- [X] MLU370
- [X] MLU590
Job types
- [x] BLOCK
- [ ] UNION1
- [ ] UNION2
- [ ] UNION4
- [ ] The operator will dynamically select the most suitable task type, for example, UNION8

3.2 Accuracy Test

3.2.1 Accuracy Test

If you have checked the following items, please tick the relevant box.

[X] Data type test (e.g. float32/int8)
[X] Multi-dimensional tensor test
[X] Layout test
[X] Different size/integer remainder end segment/alignment misalignment test
[ ] Zero dimensional tensor test/zero element test
[ ] stability test
[ ] Multiple platform test
[ ] Gen_case module test, see: Gencase-User-Guide-zh
[X] Nan/INF tests
[ ] Bug fix tests
[ ] For memory leak check details, see: GTest-User-Guide-zh
[ ] For code coverage check details, see: GTest-User-Guide-zh
[ ] For I/O calculation efficiency check details, see: MLU-OPS™-Performance-Acceptance-Standard

[       OK ] lgamma/TestSuite.mluOp/70 (62 ms)
[----------] 71 tests from lgamma/TestSuite (3641 ms total)

[----------] Global test environment tear-down
[ SUMMARY  ] Total 71 cases of 1 op(s).
ALL PASSED.
[==========] 71 test cases from 1 test suite ran. (8300 ms total)
[  PASSED  ] 71 test cases.

3.2.2 Parameter Check

Test Point-1: When a new operator is submitted, the test points are given and the test results are stated. Acceptance Standard: Normal error.

Please fill your test results(Error Message) in here, ...

Test Point-2: Whether illegal parameters are passed. Acceptance Standard: Normal error.

Test results...

3.3 Performance Test

See MLU-OPS™ Performance Acceptance Standard for details.

Platform：MLU370

3.4 Summary Analysis

the v1.0 lgamma implemtentation is a simd operator, and lack of stride support

Please give a brief overview here, if you want to note and summarize the content.

Frankd35 commented 1 month ago

compute.py

import torch
import numpy as np
from nonmlu_ops.base import *

@registerTensorList("lgamma")
class LgammaTensorList(TensorList):
    pass

@registerOp("lgamma")
class LgammaOp(OpTest):
    def __init__(self, tensorlist, params):
        super().__init__(tensorlist, params)

    def compute(self):
        # 确定输出张量
        input_tensor = self.tensor_list_.getInputTensor(0)
        output_tensor = self.tensor_list_.getOutputTensor(0)
        output_shape = self.tensor_list_.getInputTensor(0).getShape()
        datatype = input_tensor.getDataType().getNumpyStr()

        if datatype == 'float16':
            torch_input = torch.tensor(input_tensor.getDataNode().getData()).half().cuda()
        else:
            torch_input = torch.tensor(input_tensor.getDataNode().getData()).float().cuda()

        # compute baseline
        # torch_input = torch.tensor(input_tensor.getData()).cuda()
        torch_output = torch_input.new_zeros(output_shape)
        device = torch.device("cuda:0")
        lgamma_result = torch.lgamma(torch_input)

        input_has_inf = np.isinf(torch_input.cpu()).any()
        input_has_nan = np.isnan(torch_input.cpu()).any()
        result_has_inf = np.isinf(lgamma_result.cpu()).any()
        result_has_nan = np.isnan(lgamma_result.cpu()).any()

        # 转移数据到 CPU 并转换为 NumPy 数组
        lgamma_result = lgamma_result.cpu().numpy()

        # 设置输出张量的形状和数据
        output_tensor.setShape(lgamma_result.shape)
        output_tensor.setData(lgamma_result)

        # if dynamic threshold
        if self.params_.get("if_dynamic_threshold", False):
            base_node = DataNode("double")
            torch_input_fp64 = torch_input.double()
            lgamma_result_fp64 = torch.lgamma(torch_input_fp64)
            base_node.setData(lgamma_result_fp64.cpu().numpy())
            eva = diff_utils.Evaluator(base_node, output_tensor.getDataNode())
            output_tensor.setData(lgamma_result_fp64.cpu().numpy())
            if(input_has_inf or input_has_nan or result_has_inf or result_has_nan):
              output_tensor.setDiff( 0.003, 0.003)  
            else:
              output_tensor.setDiff(eva.computeDiff1(), eva.computeDiff2())

@registerProtoWriter("lgamma")
class LgammaProtoWriter(MluOpProtoWriter):
    pass

Frankd35 commented 1 month ago

lgamma_float.json lgamma_float_infnan.json lgamma_float_stride.json lgamma_half.json lgamma_half_infnan.json lgamma_half_stride.json

Frankd35 commented 1 month ago

测试点：以上 json 中包含不同数据类型(float, half)、不同维度张量、不同输入范围的测试，还包括原位支持测试、输入inf/nan 测试、极端情况下的精度测试（输入范围靠近 0），均通过防呆测试：输入空张量 -- 通过输入张量与输出张量 shape 不相同 -- 通过输入张量与输出张量类型不相同 -- 通过输入张量与输出张量不合法 -- 通过

DanieeelLiu commented 6 days ago

lgammacase.zip 需debug case

Cambricon / mlu-ops