OpenBMB / MiniCPM

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Apache License 2.0
6.72k stars 426 forks source link

MiniCPM inference env / 模型推理环境 #8

Closed soulteary closed 2 months ago

soulteary commented 7 months ago

Is there an existing issue ? / 是否已有相关的 issue ?

Describe the bug / 描述这个 bug

The model repo currently does not have a requirement.txt, so it is estimated that many user will have problems running it.

模型目前没有 requirement.txt ,所以估计不少同学运行会出问题。

I tried to run it in the latest Nvidia docker container environment and use the new version of xformers. The currently running normal environment is as follows, for the reference of other users: preview

我尝试在最新的 Nvidia 容器环境中运行,并使用新版本的 xformers,目前运行正常的环境如下,供其他同学参考:相关运行结果

Package                   Version                  Editable project location
------------------------- ------------------------ -------------------------
absl-py                   2.0.0
accelerate                0.26.1
aiofiles                  23.2.1
aiohttp                   3.9.1
aiosignal                 1.3.1
altair                    5.2.0
annotated-types           0.6.0
anyio                     4.2.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
asttokens                 2.4.1
astunparse                1.6.3
async-timeout             4.0.3
attrs                     23.1.0
audioread                 3.0.1
beautifulsoup4            4.12.2
bleach                    6.1.0
blis                      0.7.11
cachetools                5.3.2
catalogue                 2.0.10
certifi                   2023.11.17
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
cloudpathlib              0.16.0
cloudpickle               3.0.0
cmake                     3.27.9
colorama                  0.4.6
comm                      0.2.0
confection                0.1.4
contourpy                 1.2.0
cubinlinker               0.3.0+2.gbde7348
cuda-python               12.3.0rc4+8.gcb4e395
cudf                      23.10.0
cugraph                   23.10.0
cugraph-dgl               23.10.0
cugraph-service-client    23.10.0
cugraph-service-server    23.10.0
cuml                      23.10.0
cupy-cuda12x              12.2.0
cycler                    0.12.1
cymem                     2.0.8
Cython                    3.0.6
dask                      2023.9.2
dask-cuda                 23.10.0
dask-cudf                 23.10.0
debugpy                   1.8.0
decorator                 5.1.1
defusedxml                0.7.1
distributed               2023.9.2
dm-tree                   0.1.8
einops                    0.7.0
exceptiongroup            1.2.0
execnet                   2.0.2
executing                 2.0.1
expecttest                0.1.3
fastapi                   0.109.0
fastjsonschema            2.19.0
fastrlock                 0.8.2
ffmpy                     0.3.1
filelock                  3.13.1
flash-attn                2.0.4
fonttools                 4.46.0
frozenlist                1.4.0
fsspec                    2023.12.0
gast                      0.5.4
google-auth               2.25.0
google-auth-oauthlib      0.4.6
gradio                    3.48.0
gradio_client             0.6.1
graphsurgeon              0.4.6
grpcio                    1.59.3
h11                       0.14.0
httpcore                  1.0.2
httptools                 0.6.1
httpx                     0.26.0
huggingface-hub           0.20.3
hypothesis                5.35.1
idna                      3.6
importlib-metadata        7.0.0
importlib-resources       6.1.1
iniconfig                 2.0.0
intel-openmp              2021.4.0
ipykernel                 6.27.1
ipython                   8.18.1
ipython-genutils          0.2.0
jedi                      0.19.1
Jinja2                    3.1.2
joblib                    1.3.2
json5                     0.9.14
jsonschema                4.20.0
jsonschema-specifications 2023.11.2
jupyter_client            8.6.0
jupyter_core              5.5.0
jupyter-tensorboard       0.2.0
jupyterlab                2.3.2
jupyterlab_pygments       0.3.0
jupyterlab-server         1.2.0
jupytext                  1.16.0
kiwisolver                1.4.5
langcodes                 3.3.0
lazy_loader               0.3
librosa                   0.10.1
llvmlite                  0.40.1
locket                    1.0.0
Markdown                  3.5.1
markdown-it-py            3.0.0
MarkupSafe                2.1.3
matplotlib                3.8.2
matplotlib-inline         0.1.6
mdit-py-plugins           0.4.0
mdurl                     0.1.2
mistune                   3.0.2
mkl                       2021.1.1
mkl-devel                 2021.1.1
mkl-include               2021.1.1
mock                      5.1.0
mpmath                    1.3.0
msgpack                   1.0.7
multidict                 6.0.4
murmurhash                1.0.10
nbclient                  0.9.0
nbconvert                 7.12.0
nbformat                  5.9.2
nest-asyncio              1.5.8
networkx                  2.6.3
ninja                     1.11.1.1
notebook                  6.4.10
numba                     0.57.1+1.g4157f3379
numpy                     1.24.4
nvfuser                   0.1.1+gitunknown
nvidia-dali-cuda120       1.32.0
nvidia-pyindex            1.0.9
nvtx                      0.2.5
oauthlib                  3.2.2
onnx                      1.15.0rc2
opencv-python-headless    4.9.0.80
optree                    0.10.0
orjson                    3.9.12
packaging                 23.2
pandas                    1.5.3
pandocfilters             1.5.0
parso                     0.8.3
partd                     1.4.1
pexpect                   4.9.0
Pillow                    9.5.0
pip                       23.3.1
platformdirs              4.1.0
pluggy                    1.3.0
ply                       3.11
polygraphy                0.49.1
pooch                     1.8.0
preshed                   3.0.9
prettytable               3.9.0
prometheus-client         0.19.0
prompt-toolkit            3.0.41
protobuf                  4.24.4
psutil                    5.9.4
ptxcompiler               0.8.1+2.g5ad1474
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   12.0.1
pyarrow-hotfix            0.6
pyasn1                    0.5.1
pyasn1-modules            0.3.0
pybind11                  2.11.1
pybind11-global           2.11.1
pycocotools               2.0+nv0.8.0
pycparser                 2.21
pydantic                  1.10.13
pydantic_core             2.16.1
pydub                     0.25.1
Pygments                  2.17.2
pylibcugraph              23.10.0
pylibcugraphops           23.10.0
pylibraft                 23.10.0
pynvml                    11.4.1
pyparsing                 3.1.1
pytest                    7.4.3
pytest-flakefinder        1.1.0
pytest-rerunfailures      13.0
pytest-shard              0.1.2
pytest-xdist              3.5.0
python-dateutil           2.8.2
python-dotenv             1.0.1
python-hostlist           1.23.0
python-multipart          0.0.6
pytorch-quantization      2.1.2
pytz                      2023.3.post1
PyYAML                    6.0.1
pyzmq                     25.1.2
raft-dask                 23.10.0
ray                       2.9.1
referencing               0.31.1
regex                     2023.10.3
requests                  2.31.0
requests-oauthlib         1.3.1
rich                      13.7.0
rmm                       23.10.0
rpds-py                   0.13.2
rsa                       4.9
ruff                      0.1.15
safetensors               0.4.2
scikit-learn              1.2.0
scipy                     1.11.4
semantic-version          2.10.0
Send2Trash                1.8.2
sentencepiece             0.1.99
setuptools                68.2.2
shellingham               1.5.4
six                       1.16.0
smart-open                6.4.0
sniffio                   1.3.0
sortedcontainers          2.4.0
soundfile                 0.12.1
soupsieve                 2.5
soxr                      0.3.7
spacy                     3.7.2
spacy-legacy              3.0.12
spacy-loggers             1.0.5
sphinx-glpi-theme         0.4.1
srsly                     2.4.8
stack-data                0.6.3
starlette                 0.35.1
sympy                     1.12
tabulate                  0.9.0
tbb                       2021.11.0
tblib                     3.0.0
tensorboard               2.9.0
tensorboard-data-server   0.6.1
tensorboard-plugin-wit    1.8.1
tensorrt                  8.6.1
terminado                 0.18.0
thinc                     8.2.1
threadpoolctl             3.2.0
thriftpy2                 0.4.17
tinycss2                  1.2.1
tokenizers                0.15.1
toml                      0.10.2
tomli                     2.0.1
tomlkit                   0.12.0
toolz                     0.12.0
torch                     2.2.0a0+81ea7a4
torch-tensorrt            2.2.0a0
torchdata                 0.7.0a0
torchtext                 0.17.0a0
torchvision               0.17.0a0
tornado                   6.4
tqdm                      4.66.1
traitlets                 5.9.0
transformer-engine        1.1.0+cf6fc89
transformers              4.38.0.dev0
treelite                  3.9.1
treelite-runtime          3.9.1
triton                    2.1.0+6e4932c
typer                     0.9.0
types-dataclasses         0.6.6
typing_extensions         4.8.0
ucx-py                    0.34.0
uff                       0.6.9
urllib3                   1.26.18
uvicorn                   0.27.0.post1
uvloop                    0.19.0
vllm                      0.2.2+cu123
wasabi                    1.1.2
watchfiles                0.21.0
wcwidth                   0.2.12
weasel                    0.3.4
webencodings              0.5.1
websockets                11.0.3
Werkzeug                  3.0.1
wheel                     0.42.0
xdoctest                  1.0.2
xformers                  0.0.24+6600003.d20240116 /custom-build/xformers
xgboost                   1.7.6
yarl                      1.9.3
zict                      3.0.0
zipp                      3.17.0

To Reproduce / 如何复现

refs to readme.md, lol

Expected behavior / 期望的结果

No response

Screenshots / 截图

No response

Environment / 环境

and the base env is:

基础容器环境信息如下:

PyTorch version: 2.2.0a0+81ea7a4
Is debug build: False
CUDA used to build PyTorch: 12.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.27.9
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.5.0-14-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.3.107
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
Nvidia driver version: 525.147.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.24.4
[pip3] onnx==1.15.0rc2
[pip3] optree==0.10.0
[pip3] pytorch-quantization==2.1.2
[pip3] torch==2.2.0a0+81ea7a4
[pip3] torch-tensorrt==2.2.0a0
[pip3] torchdata==0.7.0a0
[pip3] torchtext==0.17.0a0
[pip3] torchvision==0.17.0a0
[pip3] triton==2.1.0+6e4932c
[conda] Could not collect

Additional context / 其他信息

No response

SUDA-HLT-ywfang commented 7 months ago

Hi, currently there are conflicts between different inference environments, we are working on a clearer way for usage.

jsonwull commented 7 months ago

ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: you need flash_attn package version to be greater or equal than 2.1.0. Detected version 2.0.4. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

LDLINGLINGLING commented 2 months ago

Hi, you may need to install a higher version of flash_attn. Enter pip install flash_attn>=2.1.0 on the command line.

soulteary commented 2 months ago
image

@LDLINGLINGLING @jsonwull The goal of this issue is to provide a reference environment for other students. This is not a feedback on the version of flash attn, and when the above version is completely locked, the program can run normally and quickly :D