-
Hi
when I run the holistic Profiling test with Jetson Orin, I found this function execute very slow(Opengl GPU memory sync to CPU memory). Like the TensorsToFloatsCalculator, TensorsToClassificati…
zaykl updated
4 weeks ago
-
Is it possible to predict the required memory for the conv2d? The code below (a convolution on a (1, 1, 204, 204) tensor) throws and error `Statically allocated circular buffers on core range [(x=0,y…
-
During the computation of cos/sin in [llama_rope#L119](https://github.com/tenstorrent/tt-metal/blob/skhorasgani/vllm_llama32_mm/models/demos/t3000/llama2_70b/tt/llama_rope.py#L119), when batch size is…
-
## Description 任务介绍
Develop backward function for batch_norm operator.
开发batch_norm算子的反向功能。
## Requirements 任务要求
__Interface 接口__
`batch_norm_backward(Tensor grad_out, Tensor input, Tensor weight, Ten…
-
```python
import torch
import thunder
from contextvars import ContextVar
_compile_data = ContextVar("compile_data", default=1)
def fn(x):
v = _compile_data.get()
return x + v
jfn…
-
**Describe the bug**
The concat operation evaluates the shapes of all input tensors to determine whether the operation can be performed. However, the LegacyShape includes padding as part of the shape …
-
**Describe the bug**
When creating a ttnn tensor with the `bfloat8_b` type from a Torch tensor containing random numbers between 0 and 1, the ttnn tensor does not match the Torch tensor values exactl…
-
This failure is happening due to folding of `ttir.broadcast` op. If an op consume one `broadcast` op then folding doesn't cause any issue. However, if multiple operands are broadcasted then `broadcast…
-
I am trying to understand one of the optimization that seems to be running when using `--EmitONNXIR` compared to `--EmitONNXBasic`
If we take the following examples
```
<
ir_versi…
-
**Reproduce**:
`gc-opt --gc-gpu-pipeline test.mlir`
**test.mlir**
```mlir
module @fragment_name attributes {"#dlti.sys_spec" = #dlti.target_system_spec} {
func.func @corner_shape_matmul_f16(…