-
### 🚀 The feature, motivation and pitch
I am working on adjustment of radix attentions now. Thank you for your support for the radix attention. Currently, catching for A that allows for more efficien…
-
Is there any way the Flash Attention 2 support for this model? if there is a way to do it i would love to get involved and help out!
I've tried implement by looking at [MusicGen's one ](https://git…
-
**Describe the bug**
When I use flash attn=2.0.4, running Nemo will result in an error `NameError: name' flash_attn_with_kvcache 'is not defined`
After checking the [code,](https://github.com/NVIDIA…
-
The 2 AMD GPU cards should be at the NERC attention @hakasapl .
Please arrange for them to be installed - techsquare? And available under ESI.
Price to charge to be addressed in https://github.com…
-
I want this trainer class to be implemented with unsloth. How can i do that.
```class CustomTrainier(Trainer):
def __init__(self, model, args, train_dataset, eval_dataset, tokenizer, **kwargs)…
-
We should redesign then navbar_alerts banners (`web/templates/navbar_alerts`).
Designs [in Figma]( https://www.figma.com/design/msWyAJ8cnMHgOMPxi7BUvA/Zulip-Web-UI-kit?node-id=563-2713&t=ZDGbub…
alya updated
10 hours ago
-
When I read the code in your nice_stand.py file, I didn't see you using self-attention or graph attention mechanisms, but you describe this part in your paper
![图片1](https://github.com/eeyhsong/NICE-…
-
Hi,
Is there a specific reason why FA V2 is being used during prefill phase but not during the Generation phase? Is it due to the fact that Flash attention does not give any significant performance y…
-
Will my training yield better results over time? Currently, the training took about 9 hours.
I have 1500 wav samples, with a total audio length of approximately 2 hours.
![Screenshot 2024-11-08 at…
-
### Model description
An updated OLMo model will be released in November. The new model has a few small architecture changes compared to the existing model in transformers:
- RMSNorm is used inste…