-
Hello,
First of all, thank you for your work on this library. I am using it to integrate a local LLM and I have encountered some strange behavior.
I would like to know if it is necessary to manu…
-
To repro:
Install the latest versions of unsloth and transformers
```
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unslot…
-
I expected a training configuration with per_device_train_batch_size=1 and gradient_accumulation_steps=32 to yield the same (or similar) result to per_device_train_batch_size=32 and gradient_accumulat…
-
Type: Bug
As we integrate our extension, vscode-jest, with the newly available Testing Coverage API, we've identified some issues that notably impact usability. Below, I provide some video demonstr…
-
Hello, I’ve been following your work recently. Based on the configurations in your repo, it seems that the reward queries for REBEL are twice as for DDPO, since REBEL uses two sampling traces per batc…
-
How does the MultipleNegativesRankingLoss function when used with gradient accumulation steps?
According to the [docs](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mult…
-
### System Info
- `transformers` version: 4.43.0.dev0
- Platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.35
- Python version: 3.10.14
- Huggingface_hub version: 0.23.4
- Safetensors version: …
-
This issue comes from https://github.com/grafana/loki/pull/13881#pullrequestreview-2237990726:
> I see now what I did wrong. The stats, warnings etc are joined in Downstream [here](https://github.com…
-
This can lead to false negatives because the threshold is overly relaxed.
```diff
diff --git a/tests/cpp/test_gpu_fused_reduction.cpp b/tests/cpp/test_gpu_fused_reduction.cpp
index e67875f4..b3923d6…
-