-
### System Info
- `transformers` version: 4.37.0.dev0
- Platform: Linux-5.15.0-89-generic-x86_64-with-glibc2.31
- Python version: 3.10.11
- Huggingface_hub version: 0.19.4
- Safetensors version: …
-
Does this method implement the data parallel for the single node and multiple node ?
-
## 🚀 Feature
## The need
Models are getting bigger and there are times when loading all the params from external storage into CPU memory at once is either not possible or calls for some extra c…
-
I have just installed EKG and receive the following error when saving a new note:
⛔ Warning (llm): Open AI API is not free software, and your freedom to use it is restricted.
See https://openai.co…
-
### Your current environment
```text
$ python collect_env.py
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used …
-
**Describe the bug**
I try to use pipeline parallelism with transformer, but I just found out it will be stuck in the mid
**To Reproduce**
Steps to reproduce the behavior:
```
#!/usr/bin/env py…
-
**Issue Description:**
When I tried to deploy the llama-hf-65B model on an 8-GPU machine, I followed the example in Distributed Inference and Serving ([link](https://docs.vllm.ai/en/latest/serving/…
-
### Your current environment
```text
The output of `python collect_env.py`
```
Collecting environment information...
PyTorch version: 2.2.1+cu118
Is debug build: False
CUDA used to build PyTorc…
-
**Describe the bug**
When running ee_inference_server.sh, I constantly received error messages like:
```
Traceback (most recent call last):
File "/data/EE-LLM/tools/run_early_exit_text_generatio…
-
Sufficient techniques for 2.4.1: Bypass Blocks is:
https://www.w3.org/WAI/WCAG22/Understanding/bypass-blocks#techniques
H69: Providing heading elements at the beginning of each section of conte…