-
I am working on a use case of loading a model with parallel gpus, then unloading the model, and loading a new model in the same process.
```
@classmethod
async def unload_models(cls, exiting=…
-
### Do you need to file an issue?
- [x] I have searched the existing issues and this bug is not already filed.
- [ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers…
-
### Is your feature request related to a problem? Please describe.
Models greater than the GPU memory capacity cannot be currently run in inference, whilst parallel implementations in training exist.…
-
Trying the example of `layer_map` [here](https://lux.csail.mit.edu/stable/api/Lux/contrib#Map-over-Layer), I wonder how to get back a specific layer given a `KeyPath`.
In the example doing on the par…
-
Moe workload generated by AICB using the following command cannot be parsed:
```bash
sh scripts/megatron_gpt.sh \
--nnodes 1 --node_rank 0 --nproc_per_node 8 --master_addr localhost --master_port 2…
-
As starting 0.2, Ollama support running in parallel models. That makes memories are more valuable than before. Even more, in operation system like Windows, if we you GPU, the memory of GPU is fixed un…
-
你好作者,我在跑你们训练的时候遇到了这个问题,请问有解决的方式吗?
/home/amax/anaconda3/bin/conda run -n WalMaFa --no-capture-output python /data1/WalMaFa/train.py
load training yaml file: ./configs/LOL/train/training_LOL.yaml
==…
-
Hi, thank you for your work on this task!
I was trying to inference the model on a custom dataset, I saw that 1 device gave a slow speed, therefore, I was trying using 2 GPUs by setting the --parra…
-
### Summary
As we work on running more than one Ersilia model in parallel, @JHlozek highlighted the scenario where we want to run **the same model** in multiple processes/terminals. This would be a v…
-
### Your current environment
The output of `python collect_env.py`
```text
Vllm version : 0.5.5
Nccl=2.20.5
Gpu : Telsa V100-sxm2-32GB
Cuda version : 12.6
Driver version : 560.28.03
`…