Cambricon / mlu-ops

Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
MIT License
103 stars 102 forks source link

[Feature](mluOpLogspace) add new operator logspace. #1015

Closed leg190 closed 1 week ago

leg190 commented 6 months ago

logspace算子的代码提交

leg190 commented 1 month ago

Thanks for your contribution and we appreciate it a lot. :rocket::rocket:

1. Motivation

add new operator logspace

2. Modification

add implementation of logspace

3. Test Report

not yet

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

For static threshold standard details, see: MLU-OPS™ Accuracy Acceptance Standard.

3.1.2 Operator Scheme checklist

3.2 Accuracy Test

3.2.1 Accuracy Test

If you have checked the following items, please tick the relevant box.

[----------] 124 tests from logspace/TestSuite (6582 ms total)

[----------] Global test environment tear-down
[ SUMMARY  ] Total 124 cases of 1 op(s).
ALL PASSED.
[==========] 124 test cases from 1 test suite ran. (10923 ms total)
[  PASSED  ] 124 test cases.

3.2.2 Parameter Check

Test Point-1: When a new operator is submitted, the test points are given and the test results are stated. Acceptance Standard: Normal error.

Please fill your test results(Error Message) in here, ...

Test Point-2: Whether illegal parameters are passed. Acceptance Standard: Normal error.

Test results...

3.3 Performance Test

See MLU-OPS™ Performance Acceptance Standard for details.

Platform:MLU370

3.4 Summary Analysis

the v1.0 logspace implemtentation is a simd operator, and lack of stride support

Please give a brief overview here, if you want to note and summarize the content.

leg190 commented 1 month ago

1. Logspace算子测试报告

本报告对logspace算子进行测试,算子功能为:返回一个数组,数组的值为在区间 $[base^{start}, base^{end}]$ 上指数均匀分布的steps个幂,输出数组的长度为steps

1.1 精度验收标准

采用动态阈值,以及静态阈值diff1<=3e-3 && diff2 <= 3e-3进行测试。

详细见 MLU-OPS™ 精度验收标准

1.2 算子方案CHECKLIST

序号 需求 需求详情
1 支持硬件 MLU370
2 job类型 block
3 layout ARRAY
4 多维
5 0元素 支持
6 数据类型 half / float32 / int32
7 规模限制

1.3 新特性测试

1.4 参数检查

提交新算子时,给出测试点,并说明测试结果。

测试点 验收标准 测试结果(出错信息)
要求steps≥0 正常报错 [MLUOP] [Error]:[mluOpLogspace] Check failed: steps >= 0.
要求输出向量长度≥steps,防止非法空间访问 正常报错 [MLUOP] [Error]:[mluOpLogspace] Check failed: steps <= element_num.

2. 功能测试

测试点 描述 数量或结果 备注
数据类型测试 half / float32 / int32
Layout 测试 支持 ARRAY
0 元素测试 支持steps=0
多平台测试 MLU370
nan / inf 测试 支持 nan / inf
内存泄漏测试 通过

3. 性能测试

详见:MLU-OPS™性能验收标准

平台:

MLU370-S4:

硬件版本 v1.1.6,驱动版本 v5.10.22

v100:

pytorch版本1.13.0,cuda版本11.7

良好:v100处理时间的5倍以内

及格:v100处理时间的10倍以内

float32:

输入规模 v100耗时(us) MLU耗时(us) 性能评估
128 6.04 6 良好
65536 7.64 9 良好
98304 7.71 11 良好
131072 7.75 14 良好
262144 14.23 22 良好

float32(base<0,需要额外判断指数是否为整数、整数奇偶性):

输入规模 v100耗时(us) MLU耗时(us) 性能评估
128 6.01 7 良好
65536 7.71 13 良好
98304 7.83 16 良好
131072 7.72 20 良好
262144 14.29 33 良好

half:

输入规模 v100耗时(us) MLU耗时(us) 性能评估
128 6.04 7 良好
65536 7.64 7 良好
98304 7.71 7 良好
131072 7.75 8 良好
262144 14.23 9 良好

int32:

输入规模 v100耗时(us) MLU耗时(us) 性能评估
128 6.04 7 良好
65536 7.64 10 良好
98394 7.71 12 良好
131072 7.75 14 良好
262144 14.23 23 良好

4. 总结分析

与cuda对齐分析:

float32/int32/half 类型与cuda行为对齐,包括但不限于 normal、负底数、inf/nan 等分支;

精度分析:

达到静态与动态阈值要求。
leg190 commented 1 month ago

compute.py

import torch
import numpy as np
from nonmlu_ops.base import *

@registerTensorList("logspace")
class LogspaceTensorList(TensorList):
    pass

@registerOp("logspace")
class LogspaceOp(OpTest):
    def __init__(self, tensorlist, params):
        super().__init__(tensorlist, params)
        # 读取参数,提供默认值
        self.start = float(self.params_.get("start", 0.0))
        self.end = float(self.params_.get("end", 1.0))
        self.steps = int(self.params_.get("steps", 100))
        self.base = float(self.params_.get("base", 10.0))

    def compute(self):
        # 确定输出张量
        output_tensor = self.tensor_list_.getOutputTensor(0)

        # 获取输出张量的数据类型
        output_dtype_str = output_tensor.getDataType().getStr()

        # 映射到 torch 数据类型
        dtype_mapping = {
            "float32": torch.float32,
            "int32": torch.int32,
            "float16": torch.float16,
            # 添加更多的数据类型映射如果需要
        }

        output_dtype = dtype_mapping.get(output_dtype_str)

        device = torch.device("cuda:0")
        logspace_result = torch.logspace(start=self.start, end=self.end, steps=self.steps, base=self.base, device=device, dtype=output_dtype)

        input_has_inf = np.isinf([self.start, self.end, self.steps, self.base]).any()
        input_has_nan = np.isnan([self.start, self.end, self.steps, self.base]).any()
        result_has_inf = np.isinf(logspace_result.cpu().float().numpy()).any()
        result_has_nan = np.isnan(logspace_result.cpu().float().numpy()).any()
        self.params_["if_dynamic_threshold"] = True

        # 转移数据到 CPU 并转换为 NumPy 数组
        logspace_result = logspace_result.cpu().numpy()

        # 设置输出张量的形状和数据
        output_tensor.setShape(logspace_result.shape)
        output_tensor.setData(logspace_result)

        if input_has_inf or input_has_nan or result_has_inf or result_has_nan or output_dtype==torch.int32:
            print("result has inf or nan, or dtype is int32, set if_dynamic_threshold is false")
            self.params_["if_dynamic_threshold"] = False
            evaluation_criterion = []
            evaluation_threshold = []
            evaluation_criterion.append("diff1")
            evaluation_criterion.append("diff2")
            evaluation_threshold.append(3e-3)
            evaluation_threshold.append(3e-3)
            self.params_["evaluation_criterion"] = evaluation_criterion
            self.params_["evaluation_threshold"] = evaluation_threshold

        # if dynamic threshold
        if self.params_.get("if_dynamic_threshold", False):
            base_node = DataNode("double")
            logspace_result_fp64 = torch.logspace(start=self.start, end=self.end, steps=self.steps, base=self.base, device=device, dtype=torch.double)
            base_node.setData(logspace_result_fp64.cpu().numpy())
            eva = diff_utils.Evaluator(base_node, output_tensor.getDataNode(), check_rate = False)
            output_tensor.setData(logspace_result_fp64.cpu().numpy())
            output_tensor.setDiff(eva.computeDiff1(), eva.computeDiff2(), -1, -1, -1)  

@registerProtoWriter("logspace")
class LogspaceProtoWriter(MluOpProtoWriter):
    def dumpOpParam2Node(self):
        param_node = self.proto_node_.logspace_param
        param_node.start = self.op_params_.get("start", 1.0)
        param_node.end = self.op_params_.get("end", 3.0)
        param_node.steps = self.op_params_.get("steps", 100)
        param_node.base = self.op_params_.get("base", 10.0)
leg190 commented 1 month ago

logspace_generated_all.json logspace_generated_except_halfNanInf.json logspace_manual_float32_infnan0.json logspace_manual_float32_normal.json logspace_manual_half_normal.json logspace_manual_int32_normal.json

mahxn0 commented 1 month ago
import json
import random

def generate_random_test_case(dtype):
    if dtype == "int32":
        start = random.randint(-1000, 1000)
        end = random.randint(-1000, 1000)
        base = random.randint(-100, 100)
    elif dtype == "float32":
        start = round(random.uniform(-1000, 1000),3)
        end = round(random.uniform(-1000, 1000),3)
        base = round(random.uniform(-100, 100),3)
    else:
        raise ValueError("Unsupported dtype")

    steps = random.randint(1, 262144)
    shape = [steps]

    return {
        "inputs": [],
        "outputs": [{"shape": shape, "dtype": dtype, "layout": "ARRAY"}],
        "op_params": {"start": start, "end": end, "steps": steps, "base": base},
    }

def generate_test_suite(dtype="int32"):
    manual_data = [
        generate_random_test_case(dtype)
        for _ in range(100)  # 生成n个随机测试用例
    ]
    test_suite = {
        "op_name": "logspace",
        "device": "gpu",
        "require_value": True,
        "supported_mlu_platform": ["370"],
        "evaluation_criterion": ["diff1", "diff2"],
        "evaluation_threshold": [3e-3, 3e-3],
        "manual_data": manual_data
    }
    return test_suite

# 生成测试套件
test_suite = generate_test_suite()

# 将测试套件转换为格式化的 JSON 字符串
test_suite_json = json.dumps(test_suite, indent=4)

# 写入到文件
file = "logspace_random_int.json"
with open(file, 'w') as f:
    f.write(test_suite_json)

print(f"Test suite has been written to {file}")
mahxn0 commented 2 weeks ago

logspace.tar.gz

mahxn0 commented 1 week ago

logspace.tar.gz

leg190 commented 3 days ago

新方案.zip