-
### Describe the issue
I always encounter bugs when building Docker, specifically with this code (Line 121 in swebench/harness/docker_build.py):
response = client.api.build(
path=str(build_di…
-
I hope to switch llama2-7b-chat and llama3-8b models.
But it cost a lot of memory size if I load both.
How to clear one if I am going to load the second model?
#model_name = 'meta-llama/L…
-
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)
```python
from aw…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
pass
### Reproduction
```
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" accelerate launch \
--config_fil…
-
Stas Bekman had the idea of supporting a HuggingFace model as input so that all model architecture settings don't need manually dug up. We'd like something like:
```
python transformer_mem.py --hf…
-
Just as the title say, I am trying to extract LORA of a Llama 3.1 70B model and it OOM on a single 24GB GPU. Is there a way to make it run on multi GPU or does all the tensors need to be in one single…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
Hi
I want to attach lora using docker command
docker run --runtime nv…
-
作者您好,请问一下您的google drive中的.index(item编码文件)是怎么得到的?我尝试按论文实验setting部分和本仓库结构复现物品编码,经过了如下步骤:
1. 将`amazon_text_emb.py`的115行的`plm_checkpoint`设为huggingface的`huggyllama/llama-7B`并运行,生成`dataset.emb-llama-td.n…
-
# Problem
When user delete a model & go back to a specific threads that uses the deleted model:
E.g. Thread A uses model Llama 3 & user deletes model Llama 3 then go back to thread A & continue the …
-
Did anyone install MedS-Ins model successfully? I couldn't find the requirements.txt file, and the hugging face only found the MedS-Ins dataset without the model