-
why don't have AM-softmax loss?
-
### Description
I noticed that this code snippet outperformes the built-in SoftMax function by 40%-45% while giving the exact same output.
```csharp
TensorPrimitives.Exp(values, destination);
var sum …
-
I a while back implemented softmax [here](https://github.com/PlasmaControl/DESC/blob/d138a4990d7a1c4d50e0f9e781c43b5d24c16b02/desc/objectives/utils.py#L285)
but I realize we also have [since grabbe…
-
**Describe the bug**
`ttnn.softmax` is unstable when numbers are large
**To Reproduce**
Example 1
```
with ttnn.manage_device(device_id=0) as device:
x = torch.tensor([[[[1000, 1001]]]…
-
### Description
During testing of the Softmax operator, inconsistent test results occur.
Test runs against the same Pytest parameters list output in different results depending on list order.
I…
-
-
**Version**
Name: flash-attn
Version: 2.6.3
Name: transformer-engine
Version: 1.11.0+c27ee60
Name: flashattn-hopper
Version: 3.0.0b1
**Bug report**
The bug occurs in this function:
@jit_fuser
def …
-
In your paper I saw, that you used Softmax, but I can't find it in your code. I only see ReLu. Did I unsterstand that wrong, or don't you even use Softmax?
-
It would be really nice if System.Numerics.Tensors.TensorPrimitives had derivative functions for SoftMax and Sigmoid as they are commonly needed in machine learing.
-
## Prerequisites
Please make sure to check off these prerequisites before submitting a bug report.
- [X] Test that the bug appears on the current version of the master branch. Make sure to include t…