Closed mechigonft closed 2 months ago
请给出可复现显存溢出的最小化脚本,以及使用环境
代码涉及隐私,无法提供,跨文件级代码input共计830行,脚本是你们提供的官方脚本,只修改了诸如模型路径等无关参数
脚本如下,考虑到代码简洁性,我将模型输出存到了txt文件中读取。 from transformers import AutoTokenizer, AutoModelForCausalLM
def get_content(path): with open(path, 'r', encoding='utf-8') as file: content = file.read() return content
model_path = "" device = "cuda" # the device to load the model onto
TOKENIZER = AutoTokenizer.from_pretrained(model_path) MODEL = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto").eval()
input_text = get_content("") model_inputs = TOKENIZER([input_text], return_tensors="pt").to(device)
max_new_tokens
to control the maximum output length.generated_ids = MODEL.generate(model_inputs.input_ids, max_new_tokens=1024, do_sample=False)[0]
output_text = TOKENIZER.decode(generated_ids[len(model_inputs.input_ids[0]):], skip_special_tokens=True)
print(f"Prompt: \n{input_text}\n\nGenerated text: \n{output_text}")
是希望您提供一个直接运行会出现显存溢出的例子,目前我们无法复现您说的显存溢出的问题。
我在使用双A10卡推理,看gpu利用率发现只有一张A10参与推理,请问如何修改推理脚本可以实现多卡并行推理?
您如果需要部署的话,建议使用vllm。脚本如上;
修改为多卡的方式,就是调整tensor_parallel_size
,为卡数即可。
llm = LLM(model="Qwen/CodeQwen1.5-7B", tensor_parallel_size=4)
请问一下,我现在的环境:transformers Version: 4.37.0 torch Version: 2.0.0 能兼容的vllm版本应该是多少?我下载的最新vllm,但是推理脚本启动失败会报错,应该是版本不兼容
请给一下报错信息已经可复现的脚本,我们没有发现不兼容现象
现在py版本要求3.9了吗,我记得前两天还没这条要求
报错:2024-04-25 10:02:54,787 WARNING utils.py:587 -- Ray currently does not support initializing Ray with fractional cpus. Your num_cpus will be truncated from 27.84 to 27. 2024-04-25 10:02:55,198 INFO worker.py:1724 -- Started a local Ray instance. INFO 04-25 10:02:55 llm_engine.py:74] Initializing an LLM engine (v0.4.0.post1) with config: model='/ossfs/node_42363669/workspace/CodeQwen1.5-7B-Chat', tokenizer='/ossfs/node_42363669/workspace/CodeQwen1.5-7B-Chat', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0) /opt/conda/lib/python3.8/site-packages/vllm/executor/ray_gpu_executor.py:87: UserWarning: Failed to get the IP address, using 0.0.0.0 by default.The value can be set by the environment variable HOST_IP. driver_ip = get_ip() (RayWorkerVllm pid=4102) /opt/conda/lib/python3.8/site-packages/vllm/engine/ray_utils.py:48: UserWarning: Failed to get the IP address, using 0.0.0.0 by default.The value can be set by the environment variable HOST_IP. (RayWorkerVllm pid=4102) return get_ip()
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
bin /opt/conda/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 121 CUDA SETUP: Loading binary /opt/conda/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so... [2024-04-25 10:03:02,274] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) /opt/conda/lib/python3.8/site-packages/pydantic/_internal/_config.py:334: UserWarning: Valid config keys have changed in V2:
You may be able to resolve this warning by setting model_config['protected_namespaces'] = ()
.
warnings.warn(
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1472, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/opt/conda/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1472, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/opt/conda/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/ossfs/node_42363669/workspace/model_vllm.py", line 944, in
absl-py 2.0.0 accelerate 0.21.0 adabench 1.2.64 aii-pypai 0.1.40.45 aiofiles 23.2.1 aiohttp 3.9.1 aiosignal 1.3.1 aistudio-analyzer 0.0.4.102 aistudio-common 0.0.28.48 aistudio-notebook 2.0.125 aistudio-serving 0.0.0.62 alipay-pcache 0.1.6 aliyun-python-sdk-core 2.14.0 aliyun-python-sdk-kms 2.16.2 altair 5.2.0 annotated-types 0.6.0 ant-couler 0.0.1rc17 anyio 4.2.0 apex 0.1 archspec 0.2.1 argo-workflows 3.5.1 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 astroid 3.0.2 asttokens 2.4.1 async-timeout 4.0.3 atorch 1.1.0rc8 attrs 23.1.0 autopep8 2.0.4 backcall 0.2.0 beautifulsoup4 4.12.2 bigmodelvis 0.0.1 bitarray 2.8.5 bitsandbytes 0.39.0 bleach 6.1.0 blinker 1.7.0 boltons 23.0.0 boto3 1.34.2 botocore 1.34.2 Brotli 1.0.9 cachetools 3.1.1 cattrs 23.2.3 certifi 2023.11.17 cffi 1.16.0 charset-normalizer 2.0.4 cheroot 10.0.0 click 8.1.7 click-config-file 0.6.0 cloudpickle 3.0.0 cmake 3.29.2 colorama 0.4.6 comm 0.2.1 conda 23.11.0 conda-content-trust 0.2.0 conda-libmamba-solver 23.12.0 conda-package-handling 2.2.0 conda_package_streaming 0.9.0 configobj 5.0.8 configparser 6.0.0 contourpy 1.1.1 couler-core 0.1.1rc11 crcmod 1.7 cryptography 41.0.7 cycler 0.12.1 Cython 3.0.6 datasets 2.15.0 debugpy 1.8.0 decorator 5.1.1 deepspeed 0.10.3 defusedxml 0.7.1 delta-center-client 0.0.4 Deprecated 1.2.14 deprecation 2.1.0 dill 0.3.7 diskcache 5.6.3 distlib 0.3.8 distro 1.8.0 dlrover 0.3.6 docker 4.1.0 docstring-to-markdown 0.13 easydl-sdk 0.0.6 einops 0.7.0 entrypoints 0.4 evaluate 0.4.0 exceptiongroup 1.2.0 executing 2.0.1 fairscale 0.4.1 fastapi 0.108.0 fastjsonschema 2.19.1 fastmoe 1.0.0 fasttext 0.9.2 fe 0.3.33 ffmpy 0.3.1 filelock 3.13.1 flake8 6.1.0 flash-attn 2.0.4 flash-attn-1 0.2.6.post2 Flask 3.0.0 fonttools 4.46.0 fqdn 1.5.1 frozenlist 1.4.1 fsspec 2023.10.0 ftfy 6.1.3 gitdb 4.0.11 GitPython 3.1.40 google-auth 2.25.2 google-auth-oauthlib 0.4.6 gradio 4.13.0 gradio_client 0.8.0 grpcio 1.34.1 grpcio-tools 1.34.1 h11 0.14.0 hjson 3.1.0 httpcore 1.0.2 httptools 0.6.1 httpx 0.26.0 huggingface-hub 0.19.4 icetk 0.0.7 idna 3.4 importlib-metadata 7.0.0 importlib-resources 6.1.1 iniconfig 2.0.0 interegular 0.3.3 ipykernel 6.28.0 ipython 8.12.3 ipython-genutils 0.2.0 isodate 0.6.1 isoduration 20.11.0 isort 5.13.2 itsdangerous 2.1.2 jaraco.functools 4.0.0 jedi 0.19.1 jedi-language-server 0.41.2 Jinja2 2.11.3 jinjasql 0.1.8 jmespath 0.10.0 joblib 1.3.2 jsonpatch 1.32 jsonpath-ng 1.6.0 jsonpointer 2.1 jsonschema 4.20.0 jsonschema-specifications 2023.11.2 jupyter_client 8.6.0 jupyter_core 5.7.1 jupyter-events 0.9.0 jupyter-lsp 2.2.1 jupyter_server 2.10.1 jupyter_server_terminals 0.5.1 jupyterlab_pygments 0.3.0 kiwisolver 1.4.5 kmitool 0.0.9 kubemaker 0.2.17 kubernetes 9.0.0 langdetect 1.0.9 lark 1.1.9 libmambapy 1.5.3 llvmlite 0.41.1 loralib 0.1.1 lsh 0.1.2 lsprotocol 2023.0.0 lxml 4.9.3 M2Crypto 0.38.0 Markdown 3.5.1 markdown-it-py 3.0.0 MarkupSafe 2.0.1 marshmallow 3.20.1 matplotlib 3.7.4 matplotlib-inline 0.1.6 mccabe 0.7.0 mdurl 0.1.2 megatron.core 0.1 menuinst 2.0.1 mistune 0.8.4 mock 5.1.0 more-itertools 10.1.0 mpi4py 3.1.5 mpmath 1.3.0 msgpack 1.0.7 multidict 6.0.4 multiprocess 0.70.15 nbclient 0.5.13 nbconvert 6.4.4 nbformat 5.9.2 nest-asyncio 1.5.8 networkx 3.0 ninja 1.11.1.1 nltk 3.8.1 notebook 6.4.6 numba 0.58.1 numpy 1.23.5 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.18.1 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.2 odps 3.5.1 opendelta 0.3.2 orjson 3.9.10 oss2 2.6.0 osscmd 0.4.5 outlines 0.0.34 overrides 3.1.0 packaging 23.1 pandas 1.0.0 pandocfilters 1.5.0 parameterized 0.9.0 parso 0.8.3 pathos 0.3.0 peft 0.3.0 peppercorn 0.6 pexpect 4.9.0 pickleshare 0.7.5 Pillow 9.3.0 pip 23.3.1 pkgutil_resolve_name 1.3.10 platformdirs 3.10.0 pluggy 1.0.0 ply 3.11 pox 0.3.3 ppft 1.7.6.7 prettytable 3.9.0 prometheus-client 0.19.0 prompt-toolkit 3.0.43 protobuf 3.20.0 psutil 5.9.6 PTable 0.9.2 ptyprocess 0.7.0 pure-eval 0.2.2 py 1.11.0 py-cpuinfo 9.0.0 py-spy 0.3.14 pyaml 21.10.1 pyarrow 12.0.0 pyarrow-hotfix 0.6 pyasn1 0.5.1 pyasn1-modules 0.3.0 pybind11 2.11.1 pycodestyle 2.11.1 pycosat 0.6.6 pycparser 2.21 pycryptodome 3.19.0 pydantic 2.5.3 pydantic_core 2.18.2 pyDes 2.0.1 pydocstyle 6.3.0 pydub 0.25.1 pyflakes 3.1.0 pygls 1.2.1 Pygments 2.17.2 pyhocon 0.3.60 pyinotify 0.9.6 pylint 3.0.3 pynvml 11.5.0 pyodps 0.11.4.1 Pyomo 6.7.0 pyOpenSSL 23.2.0 pyparsing 3.1.1 PySocks 1.7.1 pytest 7.4.3 python-dateutil 2.8.2 python-dotenv 1.0.1 python-json-logger 2.0.7 python-lsp-jsonrpc 1.1.2 python-lsp-server 1.9.0 python-multipart 0.0.6 pytoolconfig 1.2.6 pytz 2023.3.post1 PyWavelets 1.4.1 PyYAML 6.0.1 pyzmq 25.1.2 ray 2.9.0 referencing 0.32.0 regex 2023.10.3 requests 2.31.0 requests-file 1.5.1 requests-oauthlib 1.3.1 requests-toolbelt 1.0.0 responses 0.18.0 retry 0.9.2 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.7.0 rope 1.11.0 rouge-chinese 1.0.3 rouge-score 0.1.2 rpds-py 0.14.1 rsa 4.9 ruamel.yaml 0.16.10 ruamel.yaml.clib 0.2.6 ruff 0.1.11 ruff-lsp 0.0.49 s3transfer 0.9.0 safetensors 0.4.1 scikit-learn 1.3.2 scipy 1.10.1 semantic-version 2.10.0 Send2Trash 1.8.2 sentencepiece 0.1.97 setuptools 68.2.2 shellingham 1.5.4 six 1.16.0 smmap 5.0.1 sniffio 1.3.0 snowballstemmer 2.2.0 soupsieve 2.5 sqlparse 0.4.4 stack-data 0.6.3 starlette 0.32.0.post1 stringcase 1.2.0 StringGenerator 0.4.4 sympy 1.12 tabulate 0.8.2 tensorboard 2.11.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorboardX 2.6 termcolor 2.4.0 terminado 0.18.0 testpath 0.6.0 threadpoolctl 3.2.0 tiktoken 0.6.0 tinycss2 1.2.1 titans 0.0.7 tldextract 5.1.1 tokenizers 0.15.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.0 torch 2.1.2 torchaudio 2.1.0+cu121 torchpippy 0.1.1+cecc4fc torchvision 0.16.0+cu121 tornado 6.4 tqdm 4.65.0 traitlets 5.14.1 transformers 4.39.1 triton 2.1.0 typer 0.9.0 types-python-dateutil 2.8.19.20240106 typing_extensions 4.9.0 tzdata 2023.3 ujson 5.9.0 uncertainty-calibration 0.1.4 Unidecode 1.3.7 unifile-sdk 0.1.14 uri-template 1.3.0 urllib3 1.26.18 uvicorn 0.25.0 uvloop 0.19.0 virtualenv 20.25.0 vllm 0.4.0.post1 watchdog 2.3.1 watchfiles 0.21.0 wcwidth 0.2.12 web.py 0.62 webcolors 1.13 webencodings 0.5.1 websocket-client 1.7.0 websockets 11.0.3 Werkzeug 3.0.1 wfbuilder 1.0.56.43 wget 3.2 whatthepatch 1.0.5 wheel 0.41.2 wrapt 1.16.0 xattr 1.0.0 xformers 0.0.23.post1 xxhash 3.4.1 yacs 0.1.8 yapf 0.40.2 yarl 1.9.4 zdfs-dfs 2.3.2 zeep 4.2.1 zipp 3.17.0 zstandard 0.19.0
我用的vllm加速推理,有2个报错,报错提示是: ip地址不应该是0.0.0.0 AttributeError: 'FieldInfo' object has no attribute 'required'
补充2个环境信息: Python 3.8.18 cuda 12.1
https://github.com/microsoft/DeepSpeed/issues/3963
请参考这个issues,升级deepspeed的版本
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.68 GiB (GPU 0; 79.35 GiB total capacity; 47.98 GiB already allocated; 13.28 GiB free; 64.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF