-
Add benchmarks for quantized models.
This might be implemented as a new 'flavor' of test_eval, where most models raise NotImplemented and it is strictly opt-in to add quantization for particular mo…
-
To speed up the compilation process for large models or large layers, it would make sense to have a caching mechanism for long-running transformations. The cached outputs would be persistent and get r…
-
What are minimum and recommended hardware requirements to run the model and to do training?
1. How much GPU Memory (VRAM) is required?
2. How much RAM is required?
3. What GPUs are recommended?
…
-
Hello folks,
I am looking to build the llama7b int4 weight and serve via Triton. I attempted constructing it and verifying whether the int4 output is correct.
However, when I built it with ```u…
-
**Which OS are you using?**
- OS: MacOS Sonoma 14.3.1
---
I am trying to translate korean audio files and the generation works, but I often find that the subtitles generated are too long. For …
-
### Model description
X-AI recently released [grok-1](https://huggingface.co/xai-org/grok-1), a massive MoE model, with a total parameter count of 314B across 8 experts, 2 active at a time. Would be …
-
I have recently tested the precision of the lsp parametric equaliser. I enabled a couple of filters and set their gain to 0 db. The test signal was white noise. Then I noticed that the output after th…
-
Is it possible to override the gradient for a TILE function?
-
Hello, i'm the contributor of project [ISAT](). Your project sam-hq give me more help, it's a great work.
The pytorch-labs has recently released a new project [segment-anything-fast](https://github…
-
### Description
I came across this compelling sounding [JVector project](https://foojay.io/today/jvector-1-0/) which looks to have awesome QPS performance.
It uses [DiskANN](https://www.microsoft.…