THUDM / ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
Apache License 2.0
13.34k stars 1.55k forks source link

当在 SelfAttention Layer 之间插入 adapter 模块时,forward pass 出现 nan 和 inf 异常值导致 loss 为 0 #718

Closed ZionDoki closed 8 months ago

ZionDoki commented 8 months ago

System Info / 系統信息

_libgcc_mutex 0.1 main https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main _openmp_mutex 5.1 1_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main abseil-cpp 20211102.0 hd4dd3e8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main absl-py 2.0.0 pypi_0 pypi accelerate 0.25.0 pypi_0 pypi aiohttp 3.9.0 py310h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main aiosignal 1.2.0 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main anyio 3.7.1 pyhd8ed1ab_0 conda-forge arrow 1.3.0 pyhd8ed1ab_0 conda-forge arrow-cpp 11.0.0 h374c478_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main asttokens 2.4.1 pyhd8ed1ab_0 conda-forge async-timeout 4.0.2 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main attrs 23.1.0 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main aws-c-common 0.6.8 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main aws-c-event-stream 0.1.6 h6a678d5_6 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main aws-checksums 0.1.11 h5eee18b_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main aws-sdk-cpp 1.8.185 h721c034_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main backoff 2.2.1 pyhd8ed1ab_0 conda-forge beautifulsoup4 4.12.2 pyha770c72_0 conda-forge blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main blessed 1.19.1 pyhe4f9e05_2 conda-forge boost-cpp 1.82.0 hdb19cb5_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main boto3 1.33.8 pyhd8ed1ab_0 conda-forge botocore 1.33.8 pyhd8ed1ab_0 conda-forge bottleneck 1.3.5 py310ha9d4c09_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main brotli-python 1.0.9 py310h6a678d5_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main bzip2 1.0.8 h7b6447c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main c-ares 1.19.1 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main ca-certificates 2023.12.12 h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main cachecontrol 0.13.1 pyhd8ed1ab_0 conda-forge cachecontrol-with-filecache 0.13.1 pyhd8ed1ab_0 conda-forge cachetools 5.3.2 pypi_0 pypi certifi 2023.11.17 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main cffi 1.16.0 py310h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main charset-normalizer 2.0.4 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main cleo 2.1.0 pyhd8ed1ab_0 conda-forge click 8.1.7 unix_pyh707e725_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge comm 0.1.4 pyhd8ed1ab_0 conda-forge crashtest 0.4.1 pyhd8ed1ab_0 conda-forge croniter 1.4.1 pyhd8ed1ab_0 conda-forge cryptography 41.0.3 py310hdda0065_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main cuda-cudart 11.8.89 0 nvidia cuda-cupti 11.8.87 0 nvidia cuda-libraries 11.8.0 0 nvidia cuda-nvcc 12.3.107 0 nvidia cuda-nvrtc 11.8.89 0 nvidia cuda-nvtx 11.8.86 0 nvidia cuda-runtime 11.8.0 0 nvidia dataclasses 0.8 pyh6d0b6a4_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main datasets 2.14.6 py_0 huggingface dateutils 0.6.12 py_0 conda-forge dbus 1.13.6 he372182_0 conda-forge debugpy 1.6.7 py310h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main decorator 5.1.1 pyhd8ed1ab_0 conda-forge deepdiff 5.8.1 pyhd8ed1ab_0 conda-forge dill 0.3.7 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main distlib 0.3.7 pyhd8ed1ab_0 conda-forge dulwich 0.21.3 py310h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main entrypoints 0.4 pyhd8ed1ab_0 conda-forge evaluate 0.4.1 pypi_0 pypi exceptiongroup 1.2.0 pyhd8ed1ab_0 conda-forge executing 2.0.1 pyhd8ed1ab_0 conda-forge expat 2.2.10 h9c3ff4c_0 conda-forge fastapi 0.103.2 pyhd8ed1ab_0 conda-forge ffmpeg 4.3 hf484d3e_0 pytorch filelock 3.13.1 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main freetype 2.12.1 h4a9f257_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main frozenlist 1.4.0 py310h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main fsspec 2023.10.0 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main gflags 2.2.2 he6710b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main giflib 5.2.1 h5eee18b_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main glib 2.69.1 he621ea3_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main glog 0.5.0 h2531618_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main gmp 6.2.1 h295c915_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main gmpy2 2.1.2 py310heeb90bb_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main gnutls 3.6.15 he1e5248_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main google-auth 2.25.2 pypi_0 pypi google-auth-oauthlib 1.2.0 pypi_0 pypi grpc-cpp 1.48.2 he1ff14a_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main grpcio 1.60.0 pypi_0 pypi h11 0.14.0 pyhd8ed1ab_0 conda-forge huggingface-hub 0.19.4 pypi_0 pypi huggingface_hub 0.19.4 py_0 huggingface icu 73.1 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main idna 3.4 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main importlib-metadata 6.0.0 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main importlib_metadata 6.0.0 hd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main inquirer 3.1.4 pyhd8ed1ab_0 conda-forge intel-openmp 2023.1.0 hdb19cb5_46306 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main ipykernel 6.26.0 pyhf8b6a83_0 conda-forge ipython 8.18.1 pyh707e725_3 conda-forge itsdangerous 2.1.2 pyhd8ed1ab_0 conda-forge jaraco.classes 3.3.0 pyhd8ed1ab_0 conda-forge jedi 0.19.1 pyhd8ed1ab_0 conda-forge jeepney 0.8.0 pyhd8ed1ab_0 conda-forge jieba 0.42.1 pypi_0 pypi jinja2 3.1.2 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main jmespath 1.0.1 pyhd8ed1ab_0 conda-forge joblib 1.3.2 pypi_0 pypi jpeg 9e h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main jupyter_client 7.3.4 pyhd8ed1ab_0 conda-forge jupyter_core 5.5.0 py310hff52083_0 conda-forge keyring 24.3.0 py310hff52083_0 conda-forge krb5 1.20.1 h143b758_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main lame 3.100 h7b6447c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main lcms2 2.12 h3be6417_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main ld_impl_linux-64 2.38 h1181459_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main lerc 3.0 h295c915_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libboost 1.82.0 h109eef0_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libbrotlicommon 1.0.9 h5eee18b_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libbrotlidec 1.0.9 h5eee18b_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libbrotlienc 1.0.9 h5eee18b_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libcublas 11.11.3.6 0 nvidia libcufft 10.9.0.58 0 nvidia libcufile 1.8.1.2 0 nvidia libcurand 10.3.4.101 0 nvidia libcurl 8.4.0 h251f7ec_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libcusolver 11.4.1.48 0 nvidia libcusparse 11.7.5.86 0 nvidia libdeflate 1.17 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libedit 3.1.20221030 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libev 4.33 h7f8727e_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libevent 2.1.12 hdbd6064_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libffi 3.4.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libgcc-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libgfortran-ng 11.2.0 h00389a5_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libgfortran5 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libgomp 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libiconv 1.16 h7f8727e_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libidn2 2.3.4 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libjpeg-turbo 2.0.0 h9bf148f_0 pytorch libnghttp2 1.57.0 h2d74bed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libnpp 11.8.0.86 0 nvidia libnvjpeg 11.9.0.86 0 nvidia libpng 1.6.39 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libprotobuf 3.20.3 he621ea3_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libsodium 1.0.18 h36c2ea0_1 conda-forge libssh2 1.10.0 hdbd6064_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libstdcxx-ng 11.2.0 h1234567_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libtasn1 4.19.0 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libthrift 0.15.0 h1795dd8_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libtiff 4.5.1 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libunistring 0.9.10 h27cfd23_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libuuid 1.41.5 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libwebp 1.3.2 h11a3e52_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libwebp-base 1.3.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main lightning 2.1.2 pyhd8ed1ab_0 conda-forge lightning-cloud 0.5.57 pyhd8ed1ab_0 conda-forge lightning-utilities 0.10.0 pyhd8ed1ab_0 conda-forge llvm-openmp 14.0.6 h9e868ea_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main loguru 0.7.2 pypi_0 pypi lz4-c 1.9.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main markdown 3.5.1 pypi_0 pypi markdown-it-py 3.0.0 pyhd8ed1ab_0 conda-forge markupsafe 2.1.1 py310h7f8727e_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge mdurl 0.1.0 pyhd8ed1ab_0 conda-forge mkl 2023.1.0 h213fc3f_46344 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mkl-service 2.4.0 py310h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mkl_fft 1.3.8 py310h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mkl_random 1.2.4 py310hdb19cb5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main more-itertools 10.1.0 pyhd8ed1ab_0 conda-forge mpc 1.1.0 h10f8cd9_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mpfr 4.0.2 hb69a4c5_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mpmath 1.3.0 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main msgpack-python 1.0.3 py310hd09550d_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main multidict 6.0.4 py310h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main multiprocess 0.70.15 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main ncurses 6.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main nest-asyncio 1.5.8 pyhd8ed1ab_0 conda-forge nettle 3.7.3 hbbd107a_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main networkx 3.1 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main nltk 3.8.1 pypi_0 pypi numexpr 2.8.7 py310h85018f9_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main numpy 1.26.2 py310h5f9d8c6_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main numpy-base 1.26.2 py310hb5e798b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main oauthlib 3.2.2 pypi_0 pypi openh264 2.1.1 h4ff587b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main openjpeg 2.4.0 h3ad879b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main openssl 3.0.12 h7f8727e_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main orc 1.7.4 hb3bc3d3_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main ordered-set 4.1.0 pyhd8ed1ab_0 conda-forge packaging 23.1 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pandas 2.1.1 py310h1128e8f_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main parso 0.8.3 pyhd8ed1ab_0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge peft 0.6.2 pypi_0 pypi pexpect 4.8.0 pyh1a96a4e_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 10.0.1 py310ha6cbd5a_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pip 23.3.1 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pkginfo 1.9.6 pyhd8ed1ab_0 conda-forge platformdirs 3.11.0 pyhd8ed1ab_0 conda-forge poetry 1.7.1 linux_pyha804496_0 conda-forge poetry-core 1.8.1 pyhd8ed1ab_0 conda-forge poetry-plugin-export 1.6.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.42 pyha770c72_0 conda-forge protobuf 4.23.4 pypi_0 pypi psutil 5.9.6 pypi_0 pypi ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge pyarrow 11.0.0 py310h468efa6_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pyasn1 0.5.1 pypi_0 pypi pyasn1-modules 0.3.0 pypi_0 pypi pycparser 2.21 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pydantic 1.10.12 py310h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pyg 2.4.0 py310_torch_2.1.0_cu118 pyg pygments 2.17.2 pyhd8ed1ab_0 conda-forge pyjwt 2.8.0 pyhd8ed1ab_0 conda-forge pyopenssl 23.2.0 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pyparsing 3.0.9 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pyproject_hooks 1.0.0 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main python 3.10.13 h955ad1f_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main python-build 1.0.3 pyhd8ed1ab_0 conda-forge python-dateutil 2.8.2 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main python-editor 1.0.4 py_0 conda-forge python-fastjsonschema 2.19.0 pyhd8ed1ab_0 conda-forge python-installer 0.7.0 pyhd8ed1ab_0 conda-forge python-multipart 0.0.6 pyhd8ed1ab_0 conda-forge python-tzdata 2023.3 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main python-xxhash 2.0.2 py310h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main python_abi 3.10 2_cp310 conda-forge pytorch 2.1.1 py3.10_cuda11.8_cudnn8.7.0_0 pytorch pytorch-cuda 11.8 h7e8668a_5 pytorch pytorch-lightning 2.1.1 pyhd8ed1ab_0 conda-forge pytorch-mutex 1.0 cuda pytorch pytz 2023.3.post1 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pyyaml 6.0.1 py310h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pyzmq 25.1.0 py310h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main rapidfuzz 3.5.2 py310h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main re2 2022.04.01 h295c915_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main readchar 4.0.5 pyhd8ed1ab_0 conda-forge readline 8.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main regex 2023.10.3 py310h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main requests 2.31.0 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main requests-oauthlib 1.3.1 pypi_0 pypi requests-toolbelt 1.0.0 pyhd8ed1ab_0 conda-forge responses 0.18.0 pypi_0 pypi rich 13.7.0 pyhd8ed1ab_0 conda-forge rouge-chinese 1.0.3 pypi_0 pypi rouge-score 0.1.2 pypi_0 pypi rsa 4.9 pypi_0 pypi s3transfer 0.8.2 pyhd8ed1ab_0 conda-forge safetensors 0.4.0 py310ha89cbab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main scikit-learn 1.3.2 pypi_0 pypi scipy 1.11.4 py310h5f9d8c6_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main secretstorage 3.3.3 py310hff52083_2 conda-forge sentencepiece 0.1.99 pypi_0 pypi setuptools 68.0.0 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main shellingham 1.5.4 pyhd8ed1ab_0 conda-forge six 1.16.0 pyhd3eb1b0_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main snappy 1.1.9 h295c915_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main sniffio 1.3.0 pyhd8ed1ab_0 conda-forge soupsieve 2.5 pyhd8ed1ab_1 conda-forge sqlite 3.41.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main stack_data 0.6.2 pyhd8ed1ab_0 conda-forge starlette 0.27.0 pyhd8ed1ab_0 conda-forge starsessions 1.3.0 pyhd8ed1ab_0 conda-forge sympy 1.12 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main tbb 2021.8.0 hdb19cb5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main tensorboard 2.15.1 pypi_0 pypi tensorboard-data-server 0.7.2 pypi_0 pypi threadpoolctl 3.2.0 pypi_0 pypi tk 8.6.12 h1ccaba5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main tokenizers 0.13.3 py310h22610ee_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main tomli 2.0.1 pyhd8ed1ab_0 conda-forge tomlkit 0.12.3 pyha770c72_0 conda-forge torch-scatter 2.1.2+pt21cu118 pypi_0 pypi torch-sparse 0.6.18+pt21cu118 pypi_0 pypi torchaudio 2.1.1 py310_cu118 pytorch torchmetrics 1.2.1 pyhd8ed1ab_0 conda-forge torchtriton 2.1.0 py310 pytorch torchvision 0.16.1 py310_cu118 pytorch tornado 6.1 py310h5764c6d_3 conda-forge tqdm 4.65.0 py310h2f386ee_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main traitlets 5.14.0 pyhd8ed1ab_0 conda-forge transformers 4.32.1 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main trove-classifiers 2023.11.29 pyhd8ed1ab_0 conda-forge types-python-dateutil 2.8.19.14 pyhd8ed1ab_0 conda-forge typing-extensions 4.7.1 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main typing_extensions 4.7.1 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main tzdata 2023c h04d1e81_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main urllib3 1.26.18 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main utf8proc 2.6.1 h27cfd23_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main uvicorn 0.24.0.post1 py310hff52083_0 conda-forge virtualenv 20.25.0 pyhd8ed1ab_0 conda-forge wcwidth 0.2.12 pyhd8ed1ab_0 conda-forge websocket-client 1.7.0 pyhd8ed1ab_0 conda-forge websockets 10.4 py310h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main werkzeug 3.0.1 pypi_0 pypi wheel 0.41.2 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main xxhash 0.8.0 h7f8727e_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main xz 5.4.5 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main yaml 0.2.5 h7b6447c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main yarl 1.9.3 py310h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main zeromq 4.3.4 h2531618_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main zipp 3.11.0 py310h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main zlib 1.2.13 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main zstd 1.5.5 hc292b87_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main

Who can help? / 谁可以帮助到您?

@Btlmd

Information / 问题信息

Reproduction / 复现过程

在使用类似 adapter 的方式,在 forward pass过程中出现异常值,初步排查推测应该是混合精度导致的溢出。但是在 HF 的 Trainer 部分我已经给了 fp16=True 的参数。数据的预处理方式参考的这个仓库。最让人头疼的是几乎一样的实现方式 ChatGLM 就没有类似的问题,我使用几乎完全一样的方法进行微调模型没有出现异常值,BP也稳定。查了很多资料都没有找到解决的办法。此前也尝试过使用ChatGLM2,也出现了类似的错误

2024-01-09 08:17 | model stats:
>    {'Total': 6375790592, 'Trainable': 132206592, 'Percent': '2.0736%'}
2024-01-09 08:17 | init data collator √
2024-01-09 08:17 | Training  new model...
output has infs

Detected inf/nan during batch_number=0
Last 6 forward frames:
abs min  abs max  metadata

                  *** Starting batch number=0 ***
abs min  abs max  metadata
                  transformer.embedding.word_embeddings Embedding
0.00e+00 1.86e-01 weight
0.00e+00 6.48e+04 input[0]
0.00e+00 1.55e-01 output
                  transformer.embedding Embedding
0.00e+00 6.48e+04 input[0]
0.00e+00 1.55e-01 output
                  transformer.rotary_pos_emb RotaryEmbedding
     not a tensor input[0]
0.00e+00 1.00e+00 output
                  transformer.encoder.layers.0.input_layernorm RMSNorm
7.15e-07 8.48e-01 weight
0.00e+00 1.55e-01 input[0]
0.00e+00 2.91e+00 output
                  transformer.encoder.layers.0.self_attention.query_key_value Linear
0.00e+00 6.99e-01 weight
1.01e-06 7.47e+00 bias
0.00e+00 2.91e+00 input[0]
1.55e-06 9.55e+00 output
                  transformer.encoder.layers.0.self_attention.adapterk.wi Linear
0.00e+00 0.00e+00 weight
0.00e+00 3.48e+14 bias
6.14e-06 8.84e+00 input[0]
0.00e+00      inf output

插入 adapter 的代码:

class Adapter(nn.Module):
    def __init__(self, config, in_feats: int = None) -> None:
        super().__init__()
        if self.use_adapter:
            # print(in_feats or config.hidden_size)
            self.wi = nn.Linear(in_feats or config.hidden_size, in_feats or config.hidden_size)

    def forward(self, x):
        if not self.use_adapter:
            return x
        shortcut = x
        x = self.wi(x)
        return x + shortcut

class SelfAttentionWA(SelfAttention): 
    # 类似这种方式重载了所有类
    def __init__(self, config: ChatGLMConfig, layer_number, device=None):
        super().__init__(config, layer_number, device)

        if config.only_adapter_trainable:
            for params in super().parameters():
                params.requires_grad = False

        self.adapterk, self.adapterv = None, None
        if config.use_adapter:
            self.adapterk = Adapter(config, in_feats=self.qkv_hidden_size // 3)
            self.adapterv = Adapter(config, in_feats=self.qkv_hidden_size // 3)

    def forward(self, hidden_states, attention_mask, rotary_pos_emb, kv_cache=None, use_cache=True):
          # hidden_states: [sq, b, h]

          # =================================================
          # Pre-allocate memory for key-values for inference.
          # =================================================
          # =====================
          # Query, Key, and Value
          # =====================

          # Attention heads [sq, b, h] --> [sq, b, (np * 3 * hn)]
          mixed_x_layer = self.query_key_value(hidden_states)
          hidden_size = mixed_x_layer.shape[-1] // 3

          # 修改部分
          q, k, v = mixed_x_layer.permute(1, 0, 2).split(hidden_size, dim=-1)
          if self.adapterk:
              k = self.adapterk(k)
          if self.adapterv:
              v = self.adapterv(v)
          mixed_x_layer = torch.cat((q, k, v), dim=-1).permute(1, 0, 2)

          # ...

微调使用的是 huggingface 的 trainer:


from transformers import (
    AutoConfig,
    EarlyStoppingCallback,
    Seq2SeqTrainer,
    Seq2SeqTrainingArguments,
)

# .... 其他代码逻辑

training_args = Seq2SeqTrainingArguments(
    output_dir=f"output_{args.device}",
    evaluation_strategy="steps",
    logging_strategy="steps",
    logging_steps=LOGGING_STEPS,
    disable_tqdm=True,
    push_to_hub=False,
    per_device_train_batch_size=args.train_batch_size,
    per_device_eval_batch_size=args.eval_batch_size,
    eval_steps=args.save_steps,
    save_total_limit=5,
    fp16=getattr(args, "fp16", False), # 这里的fp16是True
    save_steps=args.save_steps,
    load_best_model_at_end=True,
    overwrite_output_dir=True,
    metric_for_best_model="eval_rouge-2",
    learning_rate=args.lr,
    num_train_epochs=args.epoch,
    report_to=["tensorboard"],
    warmup_steps=args.warmup_steps,
    debug="underflow_overflow" if args.debug else "",
)

#.... 其他代码逻辑

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=the_datasets.dataset["train"],
    eval_dataset=the_datasets.dataset["val"],
    data_collator=data_collator,
    compute_metrics=get_metric(token_handler.tokenizer),
    preprocess_logits_for_metrics=preprocess_logits_for_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=4)],
)

Expected behavior / 期待表现

  1. 希望可以搞清楚ChatGLM和之后两个版本间存在什么样的差异导致了前者可以正常训练、后两者则出现了异常值。
  2. 希望可以正常训练。
Paitesanshi commented 7 months ago

@ZionDoki Hi,请问这个问题是怎么解决的,我也遇到了同样的问题

fuzhao123232 commented 6 months ago

image