-
**Describe the bug**
The behavior is a bit random. **When the text generation input size < batch size from the previous step** and replica > 1. The final output could missing some samples. This does …
-
Hallo,
I have been training model in distributed pytorch using hugging face trainer API. Now i have been training model on slrum multi node multi gpu and for every GPU, it logs in mlflow ui. Is th…
-
### Your current environment
My environment setup involving two 8xH100 nodes is detailed in https://github.com/vllm-project/vllm/issues/6775; therefore, I will omit it here for brevity.
### 🐛 De…
-
# Implementation tasks
- [x] Update CRDB schemas #1095
- [x] Store past OVNs #1096
- [x] Create fork of astm-utm/Protocol https://github.com/interuss/astm-utm-protocol
- [x] Update OpenAPI definit…
-
### System Info
```Shell
- `Accelerate` version: 1.0.1
- Platform: Linux-5.15.0-124-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /home/ubuntu/doc/code/venv/bin/accelerate
- Python v…
-
### System Info
```python
- `transformers` version: 4.44.2
- Platform: Linux-5.15.0-105-generic-x86_64-with-glibc2.31
- Python version: 3.10.0
- Huggingface_hub version: 0.23.4
- Safetensors v…
-
The documentation states that e.g if used for leader election the lock can be relied upon to ensure there is only one leader.
As per https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-l…
-
When using a distributed build system, where -j is often much higher than #cores, it seems best to have a "local" pool for all non-distributed tasks to avoid overloading the local system. Unfortunate…
-
- [ ] [Zero-latency SQLite storage in every Durable Object](https://simonwillison.net/2024/Oct/13/zero-latency-sqlite-storage-in-every-durable-object/)
# Zero-latency SQLite storage in every Durable …
-
i kinda built a service on your vercel scrapper api. Would you mind sharing the v2 ?