NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.64k stars 961 forks source link

[QST] Gemm results are different with tile_description? #1769

Open hxdtest opened 2 months ago

hxdtest commented 2 months ago

What is your question? It seems that add tile_description would make the gemm result different? assert (tensor_D_numpy - tensor_D).max() == 0.0 would pass if I add tile_decription.

import numpy as np
import random
import torch
import cutlass

# This controls whether the C++ GEMM declaration will be printed at each step. 
# Set to `False` to omit this information.
print_module = True

m = 8192
n = 8192
k = 8192
dtype=torch.float16
tensor_A = torch.rand(m, k, device=0, dtype=torch.float16)   
tensor_B = torch.rand(k, n, device=0, dtype=torch.float16)   
tensor_C = torch.zeros(m, n, device=0, dtype=torch.float16)   
tensor_D = torch.zeros(m, n, device=0, dtype=torch.float16)   

alpha = 1
beta = 0.0

# We specify `element_accumulator` here so as to match the kernel run by NumPy below. However,
# specifying `element_accumulator` is not required if it is the same as `element`
plan = cutlass.Gemm(element=dtype, layout=cutlass.LayoutType.RowMajor, element_accumulator=torch.float32)
"""
tile_description = {
    "threadblock_shape":  [128, 256, 32],   # Threadblock shape
    "stages": 3,                # Number of stages
    "wrap_count" : [2, 4, 1],        # Number of warps within each dimension of the threadblock shape
    "instruction_shape":  [16, 8 , 16] ,
    "cluster_shape":  [1, 1 , 1]
}
plan.tile_description = tile_description
"""

plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)

tensor_D_numpy = (alpha * (tensor_A @ tensor_B)) + (beta * tensor_C)

assert (tensor_D_numpy - tensor_D).max() == 0.0
print(tensor_D)
jackkosaian commented 2 months ago

Can you please list the C++ output you see after the call to plan.run() with print_module=true?

github-actions[bot] commented 1 month ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.