-
Hi - nice work on this. Now that your MHA/RoPE implementations are in equinox, do you plan to make a kira version that uses those directly? For one I thing I notice the API is a little different here …
-
HI,
Do you know why I will receive the below error message with 01_cubeQuant_01_prep_fc.mha, there is no problem with 01_DESS_01_prep_fc.mha.
With 01_cubeQuant_01_prep_fc.mha (but not 01_DESS_…
-
Hello, great work. I want to know why the performance of MLA is better than that of MHA. I think MLA is a approximate low-rank decomposition for MHA.
-
#### History
- Task2
```
res = {
"error": "",
"means": {
"dice": 0.6574674234095118,
"jaccard": 0.5470246182762484,
"accuracy": 0.7255573866247164
},
"score": 0.602246020…
-
Currently these are inferred from the combination of other configurations such as device and dtype. It is more flexible for downstream users if this can be selected by choice.
-
# Description
Use exact ONNX file `attention_ln_opset13.onnx` from https://github.com/NVIDIA/TensorRT/issues/3575#issuecomment-1874776406
Attention is like ![Image](https://github.com/user-attachment…
-
Handled the case where the filename is not passed into mha_read_volume, which caused an error in calling mha_read_header with an undeclared variable (filename)
fopen doesn't seem to need the filename…
-
I tried out MLA and it was a good amount worse than MHA and wanted to try to find out why. Firstly, I am using a hybrid model therefore I am not using any Rope in either MLA or MHA, and therefore use …
-
```
Hello,
We are planning to run the MHA at our company, however we are wondering with
one MHA Manager how many nodes/apps we can monitor? 100s, 1000s etc. We have
quite a few nodes and wanted to …
-
```
Hello,
We are planning to run the MHA at our company, however we are wondering with
one MHA Manager how many nodes/apps we can monitor? 100s, 1000s etc. We have
quite a few nodes and wanted to …