mechigonft commented 2 months ago

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.68 GiB (GPU 0; 79.35 GiB total capacity; 47.98 GiB already allocated; 13.28 GiB free; 64.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

cyente commented 2 months ago

请给出可复现显存溢出的最小化脚本，以及使用环境

mechigonft commented 2 months ago

代码涉及隐私，无法提供，跨文件级代码input共计830行，脚本是你们提供的官方脚本，只修改了诸如模型路径等无关参数

mechigonft commented 2 months ago

脚本如下，考虑到代码简洁性，我将模型输出存到了txt文件中读取。 from transformers import AutoTokenizer, AutoModelForCausalLM

def get_content(path): with open(path, 'r', encoding='utf-8') as file: content = file.read() return content

model_path = "" device = "cuda" # the device to load the model onto

Now you do not need to add "trust_remote_code=True"

TOKENIZER = AutoTokenizer.from_pretrained(model_path) MODEL = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto").eval()

tokenize the input into tokens

input_text = get_content("") model_inputs = TOKENIZER([input_text], return_tensors="pt").to(device)

Use `max_new_tokens` to control the maximum output length.

generated_ids = MODEL.generate(model_inputs.input_ids, max_new_tokens=1024, do_sample=False)[0]

The generated_ids include prompt_ids, so we only need to decode the tokens after prompt_ids.

output_text = TOKENIZER.decode(generated_ids[len(model_inputs.input_ids[0]):], skip_special_tokens=True)

print(f"Prompt: \n{input_text}\n\nGenerated text: \n{output_text}")

cyente commented 2 months ago

是希望您提供一个直接运行会出现显存溢出的例子，目前我们无法复现您说的显存溢出的问题。

mechigonft commented 2 months ago

我在使用双A10卡推理，看gpu利用率发现只有一张A10参与推理，请问如何修改推理脚本可以实现多卡并行推理？

cyente commented 2 months ago

https://github.com/QwenLM/CodeQwen1.5/blob/main/examples/CodeQwen1.5-base.md#offline-batched-inference

您如果需要部署的话，建议使用vllm。脚本如上；

修改为多卡的方式，就是调整tensor_parallel_size,为卡数即可。

llm = LLM(model="Qwen/CodeQwen1.5-7B", tensor_parallel_size=4)

mechigonft commented 2 months ago

请问一下，我现在的环境：transformers Version: 4.37.0 torch Version: 2.0.0 能兼容的vllm版本应该是多少？我下载的最新vllm，但是推理脚本启动失败会报错，应该是版本不兼容

cyente commented 2 months ago

请给一下报错信息已经可复现的脚本，我们没有发现不兼容现象

mechigonft commented 2 months ago

现在py版本要求3.9了吗，我记得前两天还没这条要求

mechigonft commented 2 months ago

报错：2024-04-25 10:02:54,787 WARNING utils.py:587 -- Ray currently does not support initializing Ray with fractional cpus. Your num_cpus will be truncated from 27.84 to 27. 2024-04-25 10:02:55,198 INFO worker.py:1724 -- Started a local Ray instance. INFO 04-25 10:02:55 llm_engine.py:74] Initializing an LLM engine (v0.4.0.post1) with config: model='/ossfs/node_42363669/workspace/CodeQwen1.5-7B-Chat', tokenizer='/ossfs/node_42363669/workspace/CodeQwen1.5-7B-Chat', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0) /opt/conda/lib/python3.8/site-packages/vllm/executor/ray_gpu_executor.py:87: UserWarning: Failed to get the IP address, using 0.0.0.0 by default.The value can be set by the environment variable HOST_IP. driver_ip = get_ip() (RayWorkerVllm pid=4102) /opt/conda/lib/python3.8/site-packages/vllm/engine/ray_utils.py:48: UserWarning: Failed to get the IP address, using 0.0.0.0 by default.The value can be set by the environment variable HOST_IP. (RayWorkerVllm pid=4102) return get_ip()

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /opt/conda/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 121 CUDA SETUP: Loading binary /opt/conda/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so... [2024-04-25 10:03:02,274] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) /opt/conda/lib/python3.8/site-packages/pydantic/_internal/_config.py:334: UserWarning: Valid config keys have changed in V2:

'allow_population_by_field_name' has been renamed to 'populate_by_name'
'validate_all' has been renamed to 'validate_default' warnings.warn(message, UserWarning) /opt/conda/lib/python3.8/site-packages/pydantic/_internal/_fields.py:160: UserWarning: Field "model_persistencethreshold" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = (). warnings.warn( Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1472, in _get_module return importlib.import_module("." + module_name, self.name) File "/opt/conda/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py", line 28, in from ..integrations.deepspeed import is_deepspeed_zero3_enabled File "/opt/conda/lib/python3.8/site-packages/transformers/integrations/deepspeed.py", line 48, in from accelerate.utils.deepspeed import HfDeepSpeedConfig as DeepSpeedConfig File "/opt/conda/lib/python3.8/site-packages/accelerate/init.py", line 3, in from .accelerator import Accelerator File "/opt/conda/lib/python3.8/site-packages/accelerate/accelerator.py", line 35, in from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state File "/opt/conda/lib/python3.8/site-packages/accelerate/checkpointing.py", line 24, in from .utils import ( File "/opt/conda/lib/python3.8/site-packages/accelerate/utils/init.py", line 133, in from .launch import ( File "/opt/conda/lib/python3.8/site-packages/accelerate/utils/launch.py", line 33, in from ..utils.other import is_port_in_use, merge_dicts File "/opt/conda/lib/python3.8/site-packages/accelerate/utils/other.py", line 30, in from deepspeed import DeepSpeedEngine File "/opt/conda/lib/python3.8/site-packages/deepspeed/init.py", line 22, in from . import module_inject File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/init.py", line 6, in from .replace_module import replace_transformer_layer, revert_transformer_layer, ReplaceWithTensorSlicing, GroupQuantizer, generic_injection File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 567, in from ..pipe import PipelineModule File "/opt/conda/lib/python3.8/site-packages/deepspeed/pipe/init.py", line 6, in from ..runtime.pipe import PipelineModule, LayerSpec, TiedLayerSpec File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/pipe/init.py", line 6, in from .module import PipelineModule, LayerSpec, TiedLayerSpec File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/pipe/module.py", line 19, in from ..activation_checkpointing import checkpointing File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 26, in from deepspeed.runtime.config import DeepSpeedConfig File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/config.py", line 29, in from .zero.config import get_zero_config, ZeroStageEnum File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/zero/init.py", line 6, in from .partition_parameters import ZeroParamType File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 723, in class Init(InsertPostInitMethodToModuleSubClasses): File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 725, in Init param_persistence_threshold = get_config_default(DeepSpeedZeroConfig, "param_persistence_threshold") File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/config_utils.py", line 115, in get_config_default assert not config.fields.get( AttributeError: 'FieldInfo' object has no attribute 'required'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1472, in _get_module return importlib.import_module("." + module_name, self.name) File "/opt/conda/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/opt/conda/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 29, in from ...modeling_utils import PreTrainedModel File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 44, in from .generation import GenerationConfig, GenerationMixin File "", line 1039, in _handle_fromlist File "/opt/conda/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1462, in getattr module = self._get_module(self._class_to_module[name]) File "/opt/conda/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1474, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback): 'FieldInfo' object has no attribute 'required'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/ossfs/node_42363669/workspace/model_vllm.py", line 944, in user_handler = UserHandler('') File "/ossfs/node_42363669/workspace/model_vllm.py", line 39, in init self.llm = LLM(model=model_path, tensor_parallel_size=self.get_gpu_count()) File "/opt/conda/lib/python3.8/site-packages/vllm/entrypoints/llm.py", line 112, in init self.llm_engine = LLMEngine.from_engine_args( File "/opt/conda/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 196, in from_engine_args engine = cls( File "/opt/conda/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 110, in init self.model_executor = executor_class(model_config, cache_config, File "/opt/conda/lib/python3.8/site-packages/vllm/executor/ray_gpu_executor.py", line 62, in init self._init_workers_ray(placement_group) File "/opt/conda/lib/python3.8/site-packages/vllm/executor/ray_gpu_executor.py", line 146, in _init_workers_ray from vllm.worker.worker import Worker File "/opt/conda/lib/python3.8/site-packages/vllm/worker/worker.py", line 21, in from vllm.worker.model_runner import ModelRunner File "/opt/conda/lib/python3.8/site-packages/vllm/worker/model_runner.py", line 17, in from vllm.model_executor.model_loader import get_model File "/opt/conda/lib/python3.8/site-packages/vllm/model_executor/model_loader.py", line 10, in from vllm.model_executor.models.llava import LlavaForConditionalGeneration File "/opt/conda/lib/python3.8/site-packages/vllm/model_executor/models/llava.py", line 7, in from transformers import CLIPVisionModel, LlavaConfig File "", line 1039, in _handle_fromlist File "/opt/conda/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1463, in getattr value = getattr(module, name) File "/opt/conda/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1462, in getattr module = self._get_module(self._class_to_module[name]) File "/opt/conda/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1474, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback): Failed to import transformers.generation.utils because of the following error (look up to see its traceback): 'FieldInfo' object has no attribute 'required' (RayWorkerVllm pid=4190) /opt/conda/lib/python3.8/site-packages/vllm/engine/ray_utils.py:48: UserWarning: Failed to get the IP address, using 0.0.0.0 by default.The value can be set by the environment variable HOST_IP. (RayWorkerVllm pid=4190) return get_ip() 环境如下： $pip list Package Version

absl-py 2.0.0 accelerate 0.21.0 adabench 1.2.64 aii-pypai 0.1.40.45 aiofiles 23.2.1 aiohttp 3.9.1 aiosignal 1.3.1 aistudio-analyzer 0.0.4.102 aistudio-common 0.0.28.48 aistudio-notebook 2.0.125 aistudio-serving 0.0.0.62 alipay-pcache 0.1.6 aliyun-python-sdk-core 2.14.0 aliyun-python-sdk-kms 2.16.2 altair 5.2.0 annotated-types 0.6.0 ant-couler 0.0.1rc17 anyio 4.2.0 apex 0.1 archspec 0.2.1 argo-workflows 3.5.1 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 astroid 3.0.2 asttokens 2.4.1 async-timeout 4.0.3 atorch 1.1.0rc8 attrs 23.1.0 autopep8 2.0.4 backcall 0.2.0 beautifulsoup4 4.12.2 bigmodelvis 0.0.1 bitarray 2.8.5 bitsandbytes 0.39.0 bleach 6.1.0 blinker 1.7.0 boltons 23.0.0 boto3 1.34.2 botocore 1.34.2 Brotli 1.0.9 cachetools 3.1.1 cattrs 23.2.3 certifi 2023.11.17 cffi 1.16.0 charset-normalizer 2.0.4 cheroot 10.0.0 click 8.1.7 click-config-file 0.6.0 cloudpickle 3.0.0 cmake 3.29.2 colorama 0.4.6 comm 0.2.1 conda 23.11.0 conda-content-trust 0.2.0 conda-libmamba-solver 23.12.0 conda-package-handling 2.2.0 conda_package_streaming 0.9.0 configobj 5.0.8 configparser 6.0.0 contourpy 1.1.1 couler-core 0.1.1rc11 crcmod 1.7 cryptography 41.0.7 cycler 0.12.1 Cython 3.0.6 datasets 2.15.0 debugpy 1.8.0 decorator 5.1.1 deepspeed 0.10.3 defusedxml 0.7.1 delta-center-client 0.0.4 Deprecated 1.2.14 deprecation 2.1.0 dill 0.3.7 diskcache 5.6.3 distlib 0.3.8 distro 1.8.0 dlrover 0.3.6 docker 4.1.0 docstring-to-markdown 0.13 easydl-sdk 0.0.6 einops 0.7.0 entrypoints 0.4 evaluate 0.4.0 exceptiongroup 1.2.0 executing 2.0.1 fairscale 0.4.1 fastapi 0.108.0 fastjsonschema 2.19.1 fastmoe 1.0.0 fasttext 0.9.2 fe 0.3.33 ffmpy 0.3.1 filelock 3.13.1 flake8 6.1.0 flash-attn 2.0.4 flash-attn-1 0.2.6.post2 Flask 3.0.0 fonttools 4.46.0 fqdn 1.5.1 frozenlist 1.4.1 fsspec 2023.10.0 ftfy 6.1.3 gitdb 4.0.11 GitPython 3.1.40 google-auth 2.25.2 google-auth-oauthlib 0.4.6 gradio 4.13.0 gradio_client 0.8.0 grpcio 1.34.1 grpcio-tools 1.34.1 h11 0.14.0 hjson 3.1.0 httpcore 1.0.2 httptools 0.6.1 httpx 0.26.0 huggingface-hub 0.19.4 icetk 0.0.7 idna 3.4 importlib-metadata 7.0.0 importlib-resources 6.1.1 iniconfig 2.0.0 interegular 0.3.3 ipykernel 6.28.0 ipython 8.12.3 ipython-genutils 0.2.0 isodate 0.6.1 isoduration 20.11.0 isort 5.13.2 itsdangerous 2.1.2 jaraco.functools 4.0.0 jedi 0.19.1 jedi-language-server 0.41.2 Jinja2 2.11.3 jinjasql 0.1.8 jmespath 0.10.0 joblib 1.3.2 jsonpatch 1.32 jsonpath-ng 1.6.0 jsonpointer 2.1 jsonschema 4.20.0 jsonschema-specifications 2023.11.2 jupyter_client 8.6.0 jupyter_core 5.7.1 jupyter-events 0.9.0 jupyter-lsp 2.2.1 jupyter_server 2.10.1 jupyter_server_terminals 0.5.1 jupyterlab_pygments 0.3.0 kiwisolver 1.4.5 kmitool 0.0.9 kubemaker 0.2.17 kubernetes 9.0.0 langdetect 1.0.9 lark 1.1.9 libmambapy 1.5.3 llvmlite 0.41.1 loralib 0.1.1 lsh 0.1.2 lsprotocol 2023.0.0 lxml 4.9.3 M2Crypto 0.38.0 Markdown 3.5.1 markdown-it-py 3.0.0 MarkupSafe 2.0.1 marshmallow 3.20.1 matplotlib 3.7.4 matplotlib-inline 0.1.6 mccabe 0.7.0 mdurl 0.1.2 megatron.core 0.1 menuinst 2.0.1 mistune 0.8.4 mock 5.1.0 more-itertools 10.1.0 mpi4py 3.1.5 mpmath 1.3.0 msgpack 1.0.7 multidict 6.0.4 multiprocess 0.70.15 nbclient 0.5.13 nbconvert 6.4.4 nbformat 5.9.2 nest-asyncio 1.5.8 networkx 3.0 ninja 1.11.1.1 nltk 3.8.1 notebook 6.4.6 numba 0.58.1 numpy 1.23.5 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.18.1 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.2 odps 3.5.1 opendelta 0.3.2 orjson 3.9.10 oss2 2.6.0 osscmd 0.4.5 outlines 0.0.34 overrides 3.1.0 packaging 23.1 pandas 1.0.0 pandocfilters 1.5.0 parameterized 0.9.0 parso 0.8.3 pathos 0.3.0 peft 0.3.0 peppercorn 0.6 pexpect 4.9.0 pickleshare 0.7.5 Pillow 9.3.0 pip 23.3.1 pkgutil_resolve_name 1.3.10 platformdirs 3.10.0 pluggy 1.0.0 ply 3.11 pox 0.3.3 ppft 1.7.6.7 prettytable 3.9.0 prometheus-client 0.19.0 prompt-toolkit 3.0.43 protobuf 3.20.0 psutil 5.9.6 PTable 0.9.2 ptyprocess 0.7.0 pure-eval 0.2.2 py 1.11.0 py-cpuinfo 9.0.0 py-spy 0.3.14 pyaml 21.10.1 pyarrow 12.0.0 pyarrow-hotfix 0.6 pyasn1 0.5.1 pyasn1-modules 0.3.0 pybind11 2.11.1 pycodestyle 2.11.1 pycosat 0.6.6 pycparser 2.21 pycryptodome 3.19.0 pydantic 2.5.3 pydantic_core 2.18.2 pyDes 2.0.1 pydocstyle 6.3.0 pydub 0.25.1 pyflakes 3.1.0 pygls 1.2.1 Pygments 2.17.2 pyhocon 0.3.60 pyinotify 0.9.6 pylint 3.0.3 pynvml 11.5.0 pyodps 0.11.4.1 Pyomo 6.7.0 pyOpenSSL 23.2.0 pyparsing 3.1.1 PySocks 1.7.1 pytest 7.4.3 python-dateutil 2.8.2 python-dotenv 1.0.1 python-json-logger 2.0.7 python-lsp-jsonrpc 1.1.2 python-lsp-server 1.9.0 python-multipart 0.0.6 pytoolconfig 1.2.6 pytz 2023.3.post1 PyWavelets 1.4.1 PyYAML 6.0.1 pyzmq 25.1.2 ray 2.9.0 referencing 0.32.0 regex 2023.10.3 requests 2.31.0 requests-file 1.5.1 requests-oauthlib 1.3.1 requests-toolbelt 1.0.0 responses 0.18.0 retry 0.9.2 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.7.0 rope 1.11.0 rouge-chinese 1.0.3 rouge-score 0.1.2 rpds-py 0.14.1 rsa 4.9 ruamel.yaml 0.16.10 ruamel.yaml.clib 0.2.6 ruff 0.1.11 ruff-lsp 0.0.49 s3transfer 0.9.0 safetensors 0.4.1 scikit-learn 1.3.2 scipy 1.10.1 semantic-version 2.10.0 Send2Trash 1.8.2 sentencepiece 0.1.97 setuptools 68.2.2 shellingham 1.5.4 six 1.16.0 smmap 5.0.1 sniffio 1.3.0 snowballstemmer 2.2.0 soupsieve 2.5 sqlparse 0.4.4 stack-data 0.6.3 starlette 0.32.0.post1 stringcase 1.2.0 StringGenerator 0.4.4 sympy 1.12 tabulate 0.8.2 tensorboard 2.11.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorboardX 2.6 termcolor 2.4.0 terminado 0.18.0 testpath 0.6.0 threadpoolctl 3.2.0 tiktoken 0.6.0 tinycss2 1.2.1 titans 0.0.7 tldextract 5.1.1 tokenizers 0.15.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.0 torch 2.1.2 torchaudio 2.1.0+cu121 torchpippy 0.1.1+cecc4fc torchvision 0.16.0+cu121 tornado 6.4 tqdm 4.65.0 traitlets 5.14.1 transformers 4.39.1 triton 2.1.0 typer 0.9.0 types-python-dateutil 2.8.19.20240106 typing_extensions 4.9.0 tzdata 2023.3 ujson 5.9.0 uncertainty-calibration 0.1.4 Unidecode 1.3.7 unifile-sdk 0.1.14 uri-template 1.3.0 urllib3 1.26.18 uvicorn 0.25.0 uvloop 0.19.0 virtualenv 20.25.0 vllm 0.4.0.post1 watchdog 2.3.1 watchfiles 0.21.0 wcwidth 0.2.12 web.py 0.62 webcolors 1.13 webencodings 0.5.1 websocket-client 1.7.0 websockets 11.0.3 Werkzeug 3.0.1 wfbuilder 1.0.56.43 wget 3.2 whatthepatch 1.0.5 wheel 0.41.2 wrapt 1.16.0 xattr 1.0.0 xformers 0.0.23.post1 xxhash 3.4.1 yacs 0.1.8 yapf 0.40.2 yarl 1.9.4 zdfs-dfs 2.3.2 zeep 4.2.1 zipp 3.17.0 zstandard 0.19.0

mechigonft commented 2 months ago

我用的vllm加速推理，有2个报错，报错提示是： ip地址不应该是0.0.0.0 AttributeError: 'FieldInfo' object has no attribute 'required'

mechigonft commented 2 months ago

补充2个环境信息： Python 3.8.18 cuda 12.1

cyente commented 2 months ago

https://github.com/microsoft/DeepSpeed/issues/3963

请参考这个issues，升级deepspeed的版本

QwenLM / CodeQwen1.5

跨文件续写，单卡A100推理报错OOM #26

Now you do not need to add "trust_remote_code=True"

tokenize the input into tokens

Use `max_new_tokens` to control the maximum output length.

The generated_ids include prompt_ids, so we only need to decode the tokens after prompt_ids.

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

QwenLM / CodeQwen1.5

跨文件续写，单卡A100推理报错OOM #26

Now you do not need to add "trust_remote_code=True"

tokenize the input into tokens

Use max_new_tokens to control the maximum output length.

The generated_ids include prompt_ids, so we only need to decode the tokens after prompt_ids.

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

Use `max_new_tokens` to control the maximum output length.