-
I'm new to speculative decoding. When I was reading the speculative_decode code (https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L88), I have a few questions. Could you please help an…
-
### System Info
TensorRT-LLM v0.8.0 branch https://github.com/NVIDIA/TensorRT-LLM/blob/v0.8.0/tensorrt_llm/commands/build.py
versus main branch https://github.com/NVIDIA/TensorRT-LLM/blob/main/tenso…
-
**What problem or use case are you trying to solve?**
File editing is not perfect with our current method using SWE-Agent style actions.
**Do you have thoughts on the technical implementation?**…
-
## Function Calling
- Frontend
- Add `tools` argument in `sgl.gen`. See also guidance [tools](https://github.com/guidance-ai/guidance/blob/d1bbe1c698cbb201f89556d71193993e78c0686b/README.md?plai…
-
Given that we have only Llama 3 70B and 8B, it would be useful to have a Tiny Llama based on the Llama 3 tokenizer so that we can use it as a drafting model for speculative decoding.
Are there pla…
-
### Motivation.
I am one of the authors of the paper Stay On Topic with Classifier-Free Guidance ( https://openreview.net/forum?id=RiM3cl9MdK¬eId=s1BXLL1YZD ) who has been nominated as ICML'24 Spo…
-
### Proposal to improve performance
@LiuXiaoxuanPKU Good to see you again. Thank you for your work.
I guess your working group releases SD a little by little.
I'm wondering about current SD ver…
-
## 0. 論文
https://arxiv.org/abs/2310.12072
https://www.arxiv-vanity.com/papers/2310.12072/
[Coleman Hooper](https://arxiv.org/search/cs?searchtype=author&query=Hooper,+C), [Sehoon Kim](https://arx…
-
What are some of the intended use cases for the 0.5B model.
There are not a lot of other similar sized models and neither is there a lot of hype around them. Though general audience seems to love th…
-
Per the [recent paper from Meta](https://arxiv.org/abs/2404.19737), it appears that models that predict multiple future tokens can exhibit significantly greater sample efficiency than models trained o…