-
Hi Team,
Noticed for different attention setup, i.e. different input batch_size, sequence_length, num_heads, heads_dim, from the demo in https://triton-lang.org/main/getting-started/tutorials/06-f…
-
I was sifting through the cuDNN documentation and came across these snippets:
"cuDNN BF16 and FP16 Fused Flash Attention now supports embedding dim = 256 use cases in forward propagation.
Expand…
-
I don't think this script has multiple artboard support.
I have 4 Artboards (6000x4000 pixels) each with 22 layers (88 layers total) and there is transparency.
When I run the script it outputs 88 …
zwei7 updated
3 years ago
-
When training with graphbolt, it will not check if the necessary files exist under graph directory. We should check if the fused embedding is under graph directory before going to the training stage.
-
When I put in the trl library I get this error:
is not a valid OptimizerNames, please select one of ['adamw_hf', 'adamw_torch', 'adamw_torch_fused', 'adamw_torch_xla', 'adamw_torch_npu_fused', 'adam…
-
These files are abstracted away from the hardware differences by the operators, so I see little reason to have both of them around. They should be unified so they don't go out of sync anymore.
-
**Is your feature request related to a problem? Please describe.**
We currently use `moreh_layer_norm_backward`, but it shows pretty bad performance.
**Describe the solution you'd like**
Fused o…
-
@amirkavyan Currently everything that is implemented for the fused part in 4.2 is implemented here. The same problem as in 4.2 . Most of the benchmarks don't work. Some of them run into a segfault and…
-
Hey, team, AO provides awesome FP8 support with torch compile to get speed and memory improvement, however since torch compile is not always easily applicable for some models such as [MoE HF implement…
-
### Description of the bug:
running this script
https://github.com/johndpope/IMF/blob/main/tf-export-edge.py
```shell
python tf-export-edge.py
2024-10-19 07:20:44.455948: I tensorflo…