Suggestion: Use Torch primitives for Gaussian blur to vastly speed it up

The torchvision Gaussian function runs on the CPU even if your tensor is on the GPU. Here's a blur function that guarantees the blur will run very quickly on the GPU. Some adaptation for the tensor shape may need to be made; this one works for the latent tensor format:

`def gaussian_blur(tensor, kernel_size=5, sigma=1.0): if len(tensor.shape) == 4: # Batch of images batch_size, channels, height, width = tensor.shape else: raise ValueError("Expected a 4D tensor [B, C, H, W]")

# Create Gaussian kernel
x = torch.arange(-kernel_size // 2 + 1, kernel_size // 2 + 1, device=tensor.device)
x = torch.exp(-x**2 / (2 * sigma**2))
x = x / x.sum()

# Create 2D Gaussian kernel by outer product
kernel = x[:, None] * x[None, :]

# Expand to match input tensor shape [out_channels, in_channels, kernel_height, kernel_width]
kernel = kernel.expand(channels, 1, kernel_size, kernel_size)

# Apply Gaussian blur
blurred = F.conv2d(tensor, kernel, groups=channels, padding=kernel_size // 2)

return blurred`

cubiq / ComfyUI_essentials

Suggestion: Use Torch primitives for Gaussian blur to vastly speed it up #41