-
[Outline] I would like to add a section about optimizing the speed and response times from LLMs under General Concepts. I plan to include the below topics:
- Quantization
- Flash attention
- Arch…
-
Language tag handling
* https://github.com/WICG/translation-api/blob/main/README.md#for-a-known-source-language
* https://github.com/WICG/translation-api/blob/main/README.md#language-tag-handling
…
-
### 🚀 The feature, motivation and pitch
As we can see, Google Gemini can support up to million tokens and to serve longer context length, we have to do context parallelism, which means, split the i…
-
官方代码在训练时没有添加验证集指标,不太容易监测是否过拟合。经过尝试,增加`compute_metrics`也不行,`Trainer`的`evaluate`逻辑有点复杂走不到这里,最终还是得重构一下`evaluate`。下面分享一个很简单的重构供参考,训练过程中返回验证集的损失,只需正常添加`do_eval`、`eval_steps`、`evaluation_strategy`等参数就像。可以根据…
-
Hope you can help with this. I'm trying to implement ring attention using Llama 3 architecture and I'm starting with the blockwise parallel transformer piece. My question is when do I start to break t…
-
**What needs attention**
A special state when entering a lifter that allows for sideways momentum, with a dive animation. Would allow angled lifters to work properly.
-
Can we have [PAG](https://ku-cvlab.github.io/Perturbed-Attention-Guidance/) integrated to the BrushNet pipeline since it seems to give extremely good results. It is already there in some [standard pip…
-
It seems the Attention Couple only has model and base_mask inputs and your shared sample workflow there are more inputs.
-
**Attention** is the new currency in our information-saturated world. It shapes our worldview, drives our decisions, and ultimately affects the quality of our lives. In an era of information overload,…
-
Hello,
I have a question regarding fine tuning of quanitized internlm/internlm-xcomposer2-4khd-7b model. I have made quantization of 4khd model with lmdeploy, not trying to make fine tunning of thi…