leg190 commented 6 months ago

logspace算子的代码提交

leg190 commented 1 month ago

Thanks for your contribution and we appreciate it a lot. :rocket::rocket:

1. Motivation

add new operator logspace

2. Modification

add implementation of logspace

3. Test Report

not yet

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

For static threshold standard details, see: MLU-OPS™ Accuracy Acceptance Standard.

static threshold
- diff1
- [ ] float32 mlu diff1 <= 1e-5
- [ ] float32 mlu diff1 <= 3e-3
- [ ] float16 mlu diff1 <= 3e-3
- diff2
- [ ] float32 mlu diff2 <= 1e-5
- [ ] float32 mlu diff2 <= 3e-3
- [ ] float16 mlu diff2 <= 3e-3
- diff3
- [ ] mlu diff3 == 0
- [ ] mlu diff3_1 == 0
- [ ] mlu diff3_2 == 0
dynamic threshold
- [X] diff1: mlu diff1 <= max(baseline diff1 * 10, static threshold)
- [X] diff2: mlu diff2 <= max(baseline diff2 * 10, static threshold)
- [ ] diff3: mlu diff3 <= max(baseline diff3 * 10, static threshold)
- float32, threshold = 3e-3
- float16, threshold = 3e-3
- int32 threshold = 3e-3

3.1.2 Operator Scheme checklist

Supported hardware
- [X] MLU370
- [ ] MLU590
Job types
- [x] BLOCK
- [ ] UNION1
- [ ] UNION2
- [ ] UNION4
- [ ] The operator will dynamically select the most suitable task type, for example, UNION8

3.2 Accuracy Test

3.2.1 Accuracy Test

If you have checked the following items, please tick the relevant box.

[X] Data type test (e.g. float32/int8)
[ ] Multi-dimensional tensor test
[ ] Layout test
[X] Different size/integer remainder end segment/alignment misalignment test
[ ] Zero dimensional tensor test/zero element test
[ ] stability test
[ ] Multiple platform test
[X] Gen_case module test, see: Gencase-User-Guide-zh
[X] Nan/INF tests
[ ] Bug fix tests
[X] For memory leak check details, see: GTest-User-Guide-zh
[ ] For code coverage check details, see: GTest-User-Guide-zh
[ ] For I/O calculation efficiency check details, see: MLU-OPS™-Performance-Acceptance-Standard

[----------] 124 tests from logspace/TestSuite (6582 ms total)

[----------] Global test environment tear-down
[ SUMMARY  ] Total 124 cases of 1 op(s).
ALL PASSED.
[==========] 124 test cases from 1 test suite ran. (10923 ms total)
[  PASSED  ] 124 test cases.

3.2.2 Parameter Check

Test Point-1: When a new operator is submitted, the test points are given and the test results are stated. Acceptance Standard: Normal error.

Please fill your test results(Error Message) in here, ...

Test Point-2: Whether illegal parameters are passed. Acceptance Standard: Normal error.

Test results...

3.3 Performance Test

See MLU-OPS™ Performance Acceptance Standard for details.

Platform：MLU370

3.4 Summary Analysis

the v1.0 logspace implemtentation is a simd operator, and lack of stride support

Please give a brief overview here, if you want to note and summarize the content.

leg190 commented 1 month ago

1. Logspace算子测试报告

本报告对logspace算子进行测试，算子功能为：返回一个数组，数组的值为在区间 $[base^{start}, base^{end}]$ 上指数均匀分布的steps个幂，输出数组的长度为steps

1.1 精度验收标准

采用动态阈值，以及静态阈值diff1<=3e-3 && diff2 <= 3e-3进行测试。

详细见 MLU-OPS™ 精度验收标准

1.2 算子方案CHECKLIST

序号	需求	需求详情
1	支持硬件	MLU370
2	job类型	block
3	layout	ARRAY
4	多维	否
5	0元素	支持
6	数据类型	half / float32 / int32
7	规模限制	无

1.3 新特性测试

[√] 数据类型测试
[ ] 多维张量测试
[√] Layout 测试
[√] 不同规模 / 整数余数端段 / 对齐不对齐测试
[√] 零维张量测试/ 0 元素测试
[ ] 稳定性测试
[ ] 多平台测试
[√] gen_case模块测试
[√] nan / inf测试
[ ] bug 修复测试
[√] 内存泄漏检查, 详见GTest-User-Guide-zh
[ ] 代码覆盖率检查，详见GTest-User-Guide-zh
[ ] IO计算效率检查，详见MLU-OPS™ 性能验收标准

1.4 参数检查

提交新算子时，给出测试点，并说明测试结果。

测试点	验收标准	测试结果（出错信息）
要求steps≥0	正常报错	[MLUOP] [Error]:[mluOpLogspace] Check failed: steps >= 0.
要求输出向量长度≥steps，防止非法空间访问	正常报错	[MLUOP] [Error]:[mluOpLogspace] Check failed: steps <= element_num.

2. 功能测试

测试点	描述	数量或结果	备注
数据类型测试	half / float32 / int32
Layout 测试	支持 ARRAY
0 元素测试	支持steps=0
多平台测试	MLU370
nan / inf 测试	支持 nan / inf
内存泄漏测试	通过

3. 性能测试

详见：MLU-OPS™性能验收标准

平台：

MLU370-S4：

硬件版本 v1.1.6，驱动版本 v5.10.22

v100：

pytorch版本1.13.0，cuda版本11.7

良好：v100处理时间的5倍以内

及格：v100处理时间的10倍以内

float32：

输入规模	v100耗时(us)	MLU耗时(us)	性能评估
128	6.04	6	良好
65536	7.64	9	良好
98304	7.71	11	良好
131072	7.75	14	良好
262144	14.23	22	良好

float32(base<0，需要额外判断指数是否为整数、整数奇偶性)：

输入规模	v100耗时(us)	MLU耗时(us)	性能评估
128	6.01	7	良好
65536	7.71	13	良好
98304	7.83	16	良好
131072	7.72	20	良好
262144	14.29	33	良好

half：

输入规模	v100耗时(us)	MLU耗时(us)	性能评估
128	6.04	7	良好
65536	7.64	7	良好
98304	7.71	7	良好
131072	7.75	8	良好
262144	14.23	9	良好

int32：

输入规模	v100耗时(us)	MLU耗时(us)	性能评估
128	6.04	7	良好
65536	7.64	10	良好
98394	7.71	12	良好
131072	7.75	14	良好
262144	14.23	23	良好

4. 总结分析

与cuda对齐分析：

float32/int32/half 类型与cuda行为对齐，包括但不限于 normal、负底数、inf/nan 等分支；

精度分析：

达到静态与动态阈值要求。

leg190 commented 1 month ago

compute.py

import torch
import numpy as np
from nonmlu_ops.base import *

@registerTensorList("logspace")
class LogspaceTensorList(TensorList):
    pass

@registerOp("logspace")
class LogspaceOp(OpTest):
    def __init__(self, tensorlist, params):
        super().__init__(tensorlist, params)
        # 读取参数，提供默认值
        self.start = float(self.params_.get("start", 0.0))
        self.end = float(self.params_.get("end", 1.0))
        self.steps = int(self.params_.get("steps", 100))
        self.base = float(self.params_.get("base", 10.0))

    def compute(self):
        # 确定输出张量
        output_tensor = self.tensor_list_.getOutputTensor(0)

        # 获取输出张量的数据类型
        output_dtype_str = output_tensor.getDataType().getStr()

        # 映射到 torch 数据类型
        dtype_mapping = {
            "float32": torch.float32,
            "int32": torch.int32,
            "float16": torch.float16,
            # 添加更多的数据类型映射如果需要
        }

        output_dtype = dtype_mapping.get(output_dtype_str)

        device = torch.device("cuda:0")
        logspace_result = torch.logspace(start=self.start, end=self.end, steps=self.steps, base=self.base, device=device, dtype=output_dtype)

        input_has_inf = np.isinf([self.start, self.end, self.steps, self.base]).any()
        input_has_nan = np.isnan([self.start, self.end, self.steps, self.base]).any()
        result_has_inf = np.isinf(logspace_result.cpu().float().numpy()).any()
        result_has_nan = np.isnan(logspace_result.cpu().float().numpy()).any()
        self.params_["if_dynamic_threshold"] = True

        # 转移数据到 CPU 并转换为 NumPy 数组
        logspace_result = logspace_result.cpu().numpy()

        # 设置输出张量的形状和数据
        output_tensor.setShape(logspace_result.shape)
        output_tensor.setData(logspace_result)

        if input_has_inf or input_has_nan or result_has_inf or result_has_nan or output_dtype==torch.int32:
            print("result has inf or nan, or dtype is int32, set if_dynamic_threshold is false")
            self.params_["if_dynamic_threshold"] = False
            evaluation_criterion = []
            evaluation_threshold = []
            evaluation_criterion.append("diff1")
            evaluation_criterion.append("diff2")
            evaluation_threshold.append(3e-3)
            evaluation_threshold.append(3e-3)
            self.params_["evaluation_criterion"] = evaluation_criterion
            self.params_["evaluation_threshold"] = evaluation_threshold

        # if dynamic threshold
        if self.params_.get("if_dynamic_threshold", False):
            base_node = DataNode("double")
            logspace_result_fp64 = torch.logspace(start=self.start, end=self.end, steps=self.steps, base=self.base, device=device, dtype=torch.double)
            base_node.setData(logspace_result_fp64.cpu().numpy())
            eva = diff_utils.Evaluator(base_node, output_tensor.getDataNode(), check_rate = False)
            output_tensor.setData(logspace_result_fp64.cpu().numpy())
            output_tensor.setDiff(eva.computeDiff1(), eva.computeDiff2(), -1, -1, -1)  

@registerProtoWriter("logspace")
class LogspaceProtoWriter(MluOpProtoWriter):
    def dumpOpParam2Node(self):
        param_node = self.proto_node_.logspace_param
        param_node.start = self.op_params_.get("start", 1.0)
        param_node.end = self.op_params_.get("end", 3.0)
        param_node.steps = self.op_params_.get("steps", 100)
        param_node.base = self.op_params_.get("base", 10.0)

leg190 commented 1 month ago

logspace_generated_all.json logspace_generated_except_halfNanInf.json logspace_manual_float32_infnan0.json logspace_manual_float32_normal.json logspace_manual_half_normal.json logspace_manual_int32_normal.json

mahxn0 commented 1 month ago

import json
import random

def generate_random_test_case(dtype):
    if dtype == "int32":
        start = random.randint(-1000, 1000)
        end = random.randint(-1000, 1000)
        base = random.randint(-100, 100)
    elif dtype == "float32":
        start = round(random.uniform(-1000, 1000),3)
        end = round(random.uniform(-1000, 1000),3)
        base = round(random.uniform(-100, 100),3)
    else:
        raise ValueError("Unsupported dtype")

    steps = random.randint(1, 262144)
    shape = [steps]

    return {
        "inputs": [],
        "outputs": [{"shape": shape, "dtype": dtype, "layout": "ARRAY"}],
        "op_params": {"start": start, "end": end, "steps": steps, "base": base},
    }

def generate_test_suite(dtype="int32"):
    manual_data = [
        generate_random_test_case(dtype)
        for _ in range(100)  # 生成n个随机测试用例
    ]
    test_suite = {
        "op_name": "logspace",
        "device": "gpu",
        "require_value": True,
        "supported_mlu_platform": ["370"],
        "evaluation_criterion": ["diff1", "diff2"],
        "evaluation_threshold": [3e-3, 3e-3],
        "manual_data": manual_data
    }
    return test_suite

# 生成测试套件
test_suite = generate_test_suite()

# 将测试套件转换为格式化的 JSON 字符串
test_suite_json = json.dumps(test_suite, indent=4)

# 写入到文件
file = "logspace_random_int.json"
with open(file, 'w') as f:
    f.write(test_suite_json)

print(f"Test suite has been written to {file}")

mahxn0 commented 2 weeks ago

logspace.tar.gz

mahxn0 commented 1 week ago

logspace.tar.gz

leg190 commented 3 days ago

新方案.zip

Cambricon / mlu-ops

[Feature](mluOpLogspace) add new operator logspace. #1015

1. Motivation

2. Modification

3. Test Report

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

3.1.2 Operator Scheme checklist

3.2 Accuracy Test

3.2.1 Accuracy Test

3.2.2 Parameter Check

3.3 Performance Test

3.4 Summary Analysis

1. Logspace算子测试报告

1.1 精度验收标准

1.2 算子方案CHECKLIST

1.3 新特性测试

1.4 参数检查

2. 功能测试

3. 性能测试

4. 总结分析