-
I have succeeded run the demo "ipu_modelsx4_demo", but whether I turn the IPU device enabled or disabled in device manager, the demo runs flawless, and I can't tell if there are any difference.
I s…
-
-
As we can see, it is hardcoded to float here:
https://github.com/Dao-AILab/flash-attention/blob/d0032700d1c7a1353a3a8f2fadfbc73b2dc6b5dc/csrc/flash_attn/src/kernel_traits.h#L26
However, it is know…
-
### GitHub Tags:
#A11yMAS; #A11yTCS; #A11ySev3; #DesktopApp; #GH-MySQLextensionforAzureDataStudio-Win32-Mar23; #MySQL extension for Azure Data Studio; #Win32; #WCAG1.3.1; #Info and Relationships; #NV…
-
- [ ] [self-speculative-decoding/README.md at main · dilab-zju/self-speculative-decoding](https://github.com/dilab-zju/self-speculative-decoding/blob/main/README.md?plain=1)
# Self-Speculative Decod…
-
## Motivation
At present (b7dfe5cc), `pybind11` proper only benchmarks compile-time and and artifact size for one given test setup (which tests arguments, simple inheritance, but that's about it, I…
-
Have you tried changing the vocoder from Waveglow to HiFi-GAN? HiFi-GAN is faster and requires less VRAM. Alternatively, you could try adding a different vocoder.
-
-
I'm new to exllama, are there any tutorials on how to use this? I'm trying this with the llama-2 70b model.
-