-
I don't know if you all have other things in mind but these are some of the things one could do once `occam-ra` is readied for general use. These are some of the things I have on my wish list. There …
-
I'm running some benchmarks on TransformerEngine MHA FP8 versus FlashAttention MHA FP16. However, I'm consistently getting that TE FP8 is not only slower by 50-60% than FlashAttention; it also uses mu…
-
In latest commit, https://huggingface.co/mosaicml/mpt-7b/commit/67cf22a4e6809edb7308dd0a2ae2c1ffb86f4984, BigDL throws below error when generate text.
INFO 2024-02-20 06:41:05,962 proxy 172.17.0.2 …
-
I've been finetuning unsloth/Phi-3-mini-4k-instruct-bnb-4bit with a T4, which doesn't support flash attention, so I don't have it installed.
During evaluation, I've been running into the following …
-
### 🐛 Describe the bug
''' checkpoint_path = './llama_relevance_results'
training_args = transformers.TrainingArguments(
#remove_unused_columns=False, # Whether or not to automatically r…
-
## Context
We have a MPT MoD prefix-lm trained on llm-foundry and then exported to HuggingFace (via your scripts).
For some fine-tuning experiments with the HF model, I tried to set up dropout.
…
-
i found the datasets need informative_sg.json,but i don't know how can i get this.Could you help me ?And what is the purpose of this JSON file?
-
### 🐛 Describe the bug
when I set the `dropout_p=0.0`, the result is different. But `dropout_p=-1`, the result is same. Maybe the op scaled_dot_product_attention has some bug. Please fix it, thank…
-
We are trying to reproduce the dns64 Demucs model result from scratch. We have two 2080 Ti GPUs, but the largest batch size we are able to make is 14 and the model takes 5 hours per epoch.
Is it su…
-
I am having a go at running inference and evaluation for this model, and running into a TypeError in `GPTLMHeadModel`:
```
In [1]: import torch
...: from transformers import AutoTokenizer
…