-
Hi Fbiego,
Great work - you have taken ESP32 development to next level. Thank you for making your work open source.
I was trying to make the project work for ESP32 Wroom display module. The comp…
-
Right now, loading up Llama 3B uses more then 32 GB in DRAM (closer to 64 GB).
Related issue:
- https://github.com/tenstorrent/tt-forge-fe/issues/189
-
# Summary
Llama 3B, MLP block uses non-tile-aligned shapes for change rank ops; (un)squeeze. Error is happening during metal runtime.
Sample of the assert we're hitting:
```
E RuntimeError…
-
Here are some additional details to start:
Two stage change:
- On the tt-metal side add a method to Program class to return the `kernels_buffer` member (`Buffer` object which has public `address() `…
-
## Summary
The input and weight `global_id` taken from `tt::target::ttnn::EmbeddingOp` are swapped and passed to `ttnn::embedding` op in runtime.
## TTIR
```
module @Embedding attributes {tt.sy…
-
Given `test.mlir`:
```
#l1_block_sharded = #tt.operand_constraint
func.func @relu(%arg0: tensor) -> tensor {
// CHECK: %[[C:.*]] = "ttnn.empty"[[C:.*]]
%0 = tensor.empty() : tensor
// …
-
### Note: Depending on feedback from [this issue](https://github.com/tenstorrent/tt-metal/issues/12644), we might just close this one, and used one from tt-metal repo for tracking. However, I'm keepin…
-
The bounds are wrong and somehow `stage_mem` is also not throwing an error.
Scheduling call
```
neon = auto_stage_mem(neon, neon.find_loop("i"), "B", "B_reg")
```
Before code
```
def rank_…
-
From this device perf csv we can see FF1 and FF3 are running at 120GB/s Dram bandwidth with dram-sharded matmuls with BFP4 weights and Fp16 activations:
[TG Llama device perf](https://docs.google.com…
-
Proposal for Front-End (FE) Interaction with the tt-mlir Runtime for On-Device Tensors
In this document, I will refer to the tt-mlir runtime as the “runtime” and the forge and pjrt runtimes as “thi…