Closed leg190 closed 1 week ago
Thanks for your contribution and we appreciate it a lot. :rocket::rocket:
add new operator logspace
add implementation of logspace
not yet
For static threshold standard details, see: MLU-OPS™ Accuracy Acceptance Standard.
If you have checked the following items, please tick the relevant box.
[----------] 124 tests from logspace/TestSuite (6582 ms total)
[----------] Global test environment tear-down
[ SUMMARY ] Total 124 cases of 1 op(s).
ALL PASSED.
[==========] 124 test cases from 1 test suite ran. (10923 ms total)
[ PASSED ] 124 test cases.
Test Point-1: When a new operator is submitted, the test points are given and the test results are stated
. Acceptance Standard: Normal error
.
Please fill your test results(Error Message) in here, ...
Test Point-2: Whether illegal parameters are passed
. Acceptance Standard: Normal error
.
Test results...
See MLU-OPS™ Performance Acceptance Standard for details.
Platform:MLU370
the v1.0 logspace implemtentation is a simd operator, and lack of stride support
Please give a brief overview here, if you want to note and summarize the content.
本报告对logspace算子进行测试,算子功能为:返回一个数组,数组的值为在区间 $[base^{start}, base^{end}]$ 上指数均匀分布的steps
个幂,输出数组的长度为steps
采用动态阈值,以及静态阈值diff1<=3e-3 && diff2 <= 3e-3进行测试。
详细见 MLU-OPS™ 精度验收标准
序号 | 需求 | 需求详情 |
---|---|---|
1 | 支持硬件 | MLU370 |
2 | job类型 | block |
3 | layout | ARRAY |
4 | 多维 | 否 |
5 | 0元素 | 支持 |
6 | 数据类型 | half / float32 / int32 |
7 | 规模限制 | 无 |
提交新算子时,给出测试点,并说明测试结果。
测试点 | 验收标准 | 测试结果(出错信息) |
---|---|---|
要求steps≥0 | 正常报错 | [MLUOP] [Error]:[mluOpLogspace] Check failed: steps >= 0. |
要求输出向量长度≥steps,防止非法空间访问 | 正常报错 | [MLUOP] [Error]:[mluOpLogspace] Check failed: steps <= element_num. |
测试点 | 描述 | 数量或结果 | 备注 |
---|---|---|---|
数据类型测试 | half / float32 / int32 | ||
Layout 测试 | 支持 ARRAY | ||
0 元素测试 | 支持steps=0 | ||
多平台测试 | MLU370 | ||
nan / inf 测试 | 支持 nan / inf | ||
内存泄漏测试 | 通过 |
平台:
MLU370-S4:
硬件版本 v1.1.6,驱动版本 v5.10.22
v100:
pytorch版本1.13.0,cuda版本11.7
良好:v100处理时间的5倍以内
及格:v100处理时间的10倍以内
float32:
输入规模 | v100耗时(us) | MLU耗时(us) | 性能评估 |
---|---|---|---|
128 | 6.04 | 6 | 良好 |
65536 | 7.64 | 9 | 良好 |
98304 | 7.71 | 11 | 良好 |
131072 | 7.75 | 14 | 良好 |
262144 | 14.23 | 22 | 良好 |
float32(base<0,需要额外判断指数是否为整数、整数奇偶性):
输入规模 | v100耗时(us) | MLU耗时(us) | 性能评估 |
---|---|---|---|
128 | 6.01 | 7 | 良好 |
65536 | 7.71 | 13 | 良好 |
98304 | 7.83 | 16 | 良好 |
131072 | 7.72 | 20 | 良好 |
262144 | 14.29 | 33 | 良好 |
half:
输入规模 | v100耗时(us) | MLU耗时(us) | 性能评估 |
---|---|---|---|
128 | 6.04 | 7 | 良好 |
65536 | 7.64 | 7 | 良好 |
98304 | 7.71 | 7 | 良好 |
131072 | 7.75 | 8 | 良好 |
262144 | 14.23 | 9 | 良好 |
int32:
输入规模 | v100耗时(us) | MLU耗时(us) | 性能评估 |
---|---|---|---|
128 | 6.04 | 7 | 良好 |
65536 | 7.64 | 10 | 良好 |
98394 | 7.71 | 12 | 良好 |
131072 | 7.75 | 14 | 良好 |
262144 | 14.23 | 23 | 良好 |
与cuda对齐分析:
float32/int32/half 类型与cuda行为对齐,包括但不限于 normal、负底数、inf/nan 等分支;
精度分析:
达到静态与动态阈值要求。
compute.py
import torch
import numpy as np
from nonmlu_ops.base import *
@registerTensorList("logspace")
class LogspaceTensorList(TensorList):
pass
@registerOp("logspace")
class LogspaceOp(OpTest):
def __init__(self, tensorlist, params):
super().__init__(tensorlist, params)
# 读取参数,提供默认值
self.start = float(self.params_.get("start", 0.0))
self.end = float(self.params_.get("end", 1.0))
self.steps = int(self.params_.get("steps", 100))
self.base = float(self.params_.get("base", 10.0))
def compute(self):
# 确定输出张量
output_tensor = self.tensor_list_.getOutputTensor(0)
# 获取输出张量的数据类型
output_dtype_str = output_tensor.getDataType().getStr()
# 映射到 torch 数据类型
dtype_mapping = {
"float32": torch.float32,
"int32": torch.int32,
"float16": torch.float16,
# 添加更多的数据类型映射如果需要
}
output_dtype = dtype_mapping.get(output_dtype_str)
device = torch.device("cuda:0")
logspace_result = torch.logspace(start=self.start, end=self.end, steps=self.steps, base=self.base, device=device, dtype=output_dtype)
input_has_inf = np.isinf([self.start, self.end, self.steps, self.base]).any()
input_has_nan = np.isnan([self.start, self.end, self.steps, self.base]).any()
result_has_inf = np.isinf(logspace_result.cpu().float().numpy()).any()
result_has_nan = np.isnan(logspace_result.cpu().float().numpy()).any()
self.params_["if_dynamic_threshold"] = True
# 转移数据到 CPU 并转换为 NumPy 数组
logspace_result = logspace_result.cpu().numpy()
# 设置输出张量的形状和数据
output_tensor.setShape(logspace_result.shape)
output_tensor.setData(logspace_result)
if input_has_inf or input_has_nan or result_has_inf or result_has_nan or output_dtype==torch.int32:
print("result has inf or nan, or dtype is int32, set if_dynamic_threshold is false")
self.params_["if_dynamic_threshold"] = False
evaluation_criterion = []
evaluation_threshold = []
evaluation_criterion.append("diff1")
evaluation_criterion.append("diff2")
evaluation_threshold.append(3e-3)
evaluation_threshold.append(3e-3)
self.params_["evaluation_criterion"] = evaluation_criterion
self.params_["evaluation_threshold"] = evaluation_threshold
# if dynamic threshold
if self.params_.get("if_dynamic_threshold", False):
base_node = DataNode("double")
logspace_result_fp64 = torch.logspace(start=self.start, end=self.end, steps=self.steps, base=self.base, device=device, dtype=torch.double)
base_node.setData(logspace_result_fp64.cpu().numpy())
eva = diff_utils.Evaluator(base_node, output_tensor.getDataNode(), check_rate = False)
output_tensor.setData(logspace_result_fp64.cpu().numpy())
output_tensor.setDiff(eva.computeDiff1(), eva.computeDiff2(), -1, -1, -1)
@registerProtoWriter("logspace")
class LogspaceProtoWriter(MluOpProtoWriter):
def dumpOpParam2Node(self):
param_node = self.proto_node_.logspace_param
param_node.start = self.op_params_.get("start", 1.0)
param_node.end = self.op_params_.get("end", 3.0)
param_node.steps = self.op_params_.get("steps", 100)
param_node.base = self.op_params_.get("base", 10.0)
import json
import random
def generate_random_test_case(dtype):
if dtype == "int32":
start = random.randint(-1000, 1000)
end = random.randint(-1000, 1000)
base = random.randint(-100, 100)
elif dtype == "float32":
start = round(random.uniform(-1000, 1000),3)
end = round(random.uniform(-1000, 1000),3)
base = round(random.uniform(-100, 100),3)
else:
raise ValueError("Unsupported dtype")
steps = random.randint(1, 262144)
shape = [steps]
return {
"inputs": [],
"outputs": [{"shape": shape, "dtype": dtype, "layout": "ARRAY"}],
"op_params": {"start": start, "end": end, "steps": steps, "base": base},
}
def generate_test_suite(dtype="int32"):
manual_data = [
generate_random_test_case(dtype)
for _ in range(100) # 生成n个随机测试用例
]
test_suite = {
"op_name": "logspace",
"device": "gpu",
"require_value": True,
"supported_mlu_platform": ["370"],
"evaluation_criterion": ["diff1", "diff2"],
"evaluation_threshold": [3e-3, 3e-3],
"manual_data": manual_data
}
return test_suite
# 生成测试套件
test_suite = generate_test_suite()
# 将测试套件转换为格式化的 JSON 字符串
test_suite_json = json.dumps(test_suite, indent=4)
# 写入到文件
file = "logspace_random_int.json"
with open(file, 'w') as f:
f.write(test_suite_json)
print(f"Test suite has been written to {file}")
logspace算子的代码提交