-
### The quantization format
Hi all,
We have recently designed and open-sourced a new method for Vector Quantization called Vector Post-Training Quantization (VPTQ). Our work is available at [VPTQ…
-
Dear Authors,
Firstly, thank you for your great work, "Making Text Embedder Few-Shot Learners". It was very interesting to see how you improved the performance of text embedding by leveraging the …
-
Hi, thanks for your great work.
May I ask what the training time and memory usage are when using the 7B parameter LLM?
Looking forward to your reply.
-
Hi,
I try to use my LoRA weights that I got from PEFT using NeMo Framework container (with tp=pp=2, 4 gpus) with TensorRT-LLM ModelRunner (the [TensorRT-LLM/examples/run.py](https://github.com/NVIDIA…
-
# Key Observation
- LLMs exhibit a significant decline in reasoning abilities when subjected to strict format restrictions.
- The stricter the format, the greater the performance degradation in rea…
-
Related: https://github.com/kubeflow/training-operator/issues/2170
Once we implement storage initializers, trainers, and controllers, we should add the LLM training runtimes.
We can start with run…
-
Would consider release the training code?
-
### Describe the feature
https://github.com/linkedin/Liger-Kernel
Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU train…
-
Hi, @haesleinhuepf recommended I post this here.
Because you guys are making this evaluation benchmark public, couldn't LLMs use this as training data and therefore overfit to it?
So, it will be h…
-
Is it possible to add to https://nvidia.github.io/TensorRT-LLM/ the code copy widget that you already have on https://nvidia.github.io/TensorRT-Model-Optimizer/?
For example if you go to https://nvidi…