-
### Prerequisites
- [X] I have read the [ServerlessLLM documentation](https://serverlessllm.github.io/).
- [X] I have searched the [Issue Tracker](https://github.com/ServerlessLLM/ServerlessLLM/issue…
-
### System Info
- `transformers` version: 4.45.0
- Platform: Linux-6.8.0-48-generic-x86_64-with-glibc2.39
- Python version: 3.10.15
- Huggingface_hub version: 0.26.2
- Safetensors version: 0.4.5
…
-
Especially w.r.t data loading, the data should automatically be transferred to the correct device.
Right now the batch is moved to the first GPU and users are moving the inputs to the correct devic…
-
**What would you like to be added**:
LeaderWorkerSet should support heterogenous resource requirements across Workers.
**Why is this needed**:
In the use case of disaggregated serving there m…
-
-
Input: `aem start`
Display:
```
[INFO] Determining whether host GPU is available...
[INFO] USE_GPU_HOST: 1
[INFO] Starting to pull Docker image registry.baidubce.com/apollo/apollo-env-gpu:lat…
-
I am trying to run all the pytests on a GPU instance
To set up the environment, I installed the [dev] and [gpu] dependencies, but encountered the following issue:
```
pip install -e .[dev]
tor…
-
## System Info
Intel(R) Xeon(R) Platinum 8468
NVIDIA H800-80G
TensorRT-LLM version 0.12.0
## Who can help?
@Tracin @byshiue
## Reproduction
I followed the official procedure for LLama2 7b quantiza…
-
I start training with this command
'python main.py --base configs/autoencoder/vqmodel1.yaml -t --gpus 4,5'
but I got this
everything works fine, steps in one epoch are halved, but only one gpu is…
-