PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.26k stars 5.6k forks source link

RMSProp 报异常,换成AdamW没问题 #54125

Closed zouhan6806504 closed 1 year ago

zouhan6806504 commented 1 year ago

请提出你的问题 Please ask your question

报错信息

Traceback (most recent call last):
  File "/home/aistudio/train_attlstm.py", line 168, in <module>
    optimizer.step()
  File "<decorator-gen-298>", line 2, in step
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 319, in __impl__
    return func(*args, **kwargs)
  File "<decorator-gen-296>", line 2, in step
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 26, in __impl__
    return wrapped_func(*args, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 534, in __impl__
    return func(*args, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/optimizer/optimizer.py", line 1440, in step
    param_group_idx=0,
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/optimizer/optimizer.py", line 1167, in _apply_optimize
    params_grads, param_group_idx=param_group_idx
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/optimizer/optimizer.py", line 948, in _create_optimization_pass
    target_block, param_and_grad
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/optimizer/rmsprop.py", line 243, in _append_optimize_op
    stop_gradient=True,
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 4005, in append_op
    inplace_map,
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 314, in trace_op
    stop_gradient, inplace_map)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/tracer.py", line 176, in eager_legacy_trace_op
    returns = function_ptr(*arg_list, *attrs_list)
ValueError: (InvalidArgument) The current CompatMetaTensor is not initialized.
  [Hint: Expected meta_tensor.initialized() == true, but received meta_tensor.initialized():0 != true:1.] (at /paddle/paddle/fluid/framework/infershape_utils.cc:144)
  [operator < rmsprop > error]

代码如下

lr_scheduler = paddle.optimizer.lr.PiecewiseDecay(boundaries=[5, 8], values=[0.00025, 0.000025, 0.0000025], verbose=True)
optimizer = paddle.optimizer.RMSProp(learning_rate=lr_scheduler, centered=True,
        parameters=model.parameters(),
        weight_decay=paddle.regularizer.L2Decay(0.0004))

要是换成AdamW就能使用

from paddle.optimizer import AdamW

def get_AdamW_optimizer(
    parameters,
    decay_params,
    lr: float = 3e-5,
    eps: float = 1e-6,
    weight_decay: float = 1e-3):
    clip = paddle.nn.ClipGradByGlobalNorm(clip_norm=5.0)
    optimizer = AdamW(learning_rate=lr,
                      parameters=parameters,
                      epsilon=eps,
                      weight_decay=weight_decay,
                      apply_decay_param_fun=lambda x: x in decay_params,
                      grad_clip=clip
                      )
    return optimizer

decay_params = [
    p.name for n, p in model.named_parameters()
    if not any(nd in n for nd in ["bias", "norm"])
]
eps = 1e-6
weight_decay = 1e-3

optimizer = get_AdamW_optimizer(model.parameters(), decay_params, lr=lr_scheduler, eps=eps, weight_decay=weight_decay)

这个可能是什么原因造成的?环境aistudio

DesmonDay commented 1 year ago

试试看更新一下paddle版本呢

zouhan6806504 commented 1 year ago

试试看更新一下paddle版本呢

我把版本从2.4.0升到2.4.1,python由3.7升到3.9,也是一样的bug

liudongxue01 commented 1 year ago

请问可以提供一下完整代码吗?少量训练数据是不是就能复现问题? 如果可以,也请麻烦提供一些复现问题的样本数据。 @zouhan6806504

zouhan6806504 commented 1 year ago

请问可以提供一下完整代码吗?少量训练数据是不是就能复现问题? 如果可以,也请麻烦提供一些复现问题的样本数据。 @zouhan6806504

https://aistudio.baidu.com/studio/project/partial/verify/6258891/9ad63eeb0ede41dfa7646eb86587f8e1 直接解压数据执行trainxx文件就能复现

liudongxue01 commented 1 year ago

请问可以提供一下完整代码吗?少量训练数据是不是就能复现问题? 如果可以,也请麻烦提供一些复现问题的样本数据。 @zouhan6806504

https://aistudio.baidu.com/studio/project/partial/verify/6258891/9ad63eeb0ede41dfa7646eb86587f8e1 直接解压数据执行trainxx文件就能复现

非常抱歉,可能由于时间长了,这个链接已失效。麻烦帮忙重新生成一个吧。谢谢! @zouhan6806504

zouhan6806504 commented 1 year ago

请问可以提供一下完整代码吗?少量训练数据是不是就能复现问题? 如果可以,也请麻烦提供一些复现问题的样本数据。 @zouhan6806504

https://aistudio.baidu.com/studio/project/partial/verify/6258891/9ad63eeb0ede41dfa7646eb86587f8e1 直接解压数据执行trainxx文件就能复现

非常抱歉,可能由于时间长了,这个链接已失效。麻烦帮忙重新生成一个吧。谢谢! @zouhan6806504

https://aistudio.baidu.com/studio/project/partial/verify/6303671/75f9f727edec49a680c75a4d9dd9c3e1

liudongxue01 commented 1 year ago

我用最新版本的paddle develop分支的代码没有复现出来。 你能确定你用的paddle、paddlenlp的版本吗? @zouhan6806504

zouhan6806504 commented 1 year ago

我用最新版本的paddle develop分支的代码没有复现出来。 你能确定你用的paddle、paddlenlp的版本吗? @zouhan6806504

absl-py 0.8.1 aiofiles 23.1.0 aiohttp 3.8.3 aiosignal 1.2.0 alembic 1.8.1 altair 4.2.0 anyio 3.6.1 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 aspy.yaml 1.3.0 astor 0.8.1 astroid 2.4.1 async-generator 1.10 async-timeout 4.0.2 asynctest 0.13.0 attrs 22.1.0 audioread 2.1.8 autopep8 1.6.0 Babel 2.8.0 backcall 0.1.0 backports.zoneinfo 0.2.1 bce-python-sdk 0.8.53 beautifulsoup4 4.11.1 bleach 5.0.1 blinker 1.5 cachetools 4.0.0 certifi 2019.9.11 certipy 0.1.3 cffi 1.15.1 cfgv 2.0.1 chardet 3.0.4 charset-normalizer 2.1.1 click 8.0.4 cloudpickle 1.6.0 cma 2.7.0 colorama 0.4.4 colorlog 4.1.0 commonmark 0.9.1 cryptography 38.0.1 cycler 0.10.0 Cython 0.29 datasets 2.7.0 debugpy 1.6.0 decorator 4.4.2 defusedxml 0.7.1 dill 0.3.3 easydict 1.9 entrypoints 0.4 et-xmlfile 1.0.1 fastapi 0.95.0 fastjsonschema 2.16.1 ffmpy 0.3.0 filelock 3.0.12 fire 0.5.0 flake8 4.0.1 Flask 1.1.1 Flask-Babel 1.0.0 Flask-Cors 3.0.8 forbiddenfruit 0.1.3 frozenlist 1.3.0 fsspec 2022.11.0 funcsigs 1.0.2 future 0.18.0 gast 0.3.3 gitdb 4.0.5 GitPython 3.1.14 google-auth 1.10.0 google-auth-oauthlib 0.4.1 gradio 3.19.1 graphviz 0.13 greenlet 1.1.3 grpcio 1.35.0 gunicorn 20.0.4 gym 0.12.1 h11 0.14.0 h5py 2.9.0 httpcore 0.16.3 httpx 0.23.3 huggingface-hub 0.11.0 identify 1.4.10 idna 2.8 imageio 2.6.1 imageio-ffmpeg 0.3.0 importlib-metadata 4.2.0 importlib-resources 5.9.0 ipykernel 6.9.1 ipython 7.34.0 ipython-genutils 0.2.0 ipywidgets 7.6.5 isort 4.3.21 itsdangerous 1.1.0 jdcal 1.4.1 jedi 0.17.2 jieba 0.42.1 Jinja2 3.0.0 joblib 0.14.1 JPype1 0.7.2 json5 0.9.5 jsonschema 4.16.0 jupyter-archive 3.2.1 jupyter_client 7.3.5 jupyter-core 4.11.1 jupyter-lsp 1.5.1 jupyter-server 1.16.0 jupyter-telemetry 0.1.0 jupyterhub 1.3.0 jupyterlab 3.4.5 jupyterlab-language-pack-zh-CN 3.4.post1 jupyterlab-pygments 0.2.2 jupyterlab-server 2.10.3 jupyterlab-widgets 3.0.3 kiwisolver 1.1.0 lazy-object-proxy 1.4.3 librosa 0.7.2 lightgbm 3.1.1 linkify-it-py 2.0.0 llvmlite 0.31.0 lxml 4.9.1 Mako 1.2.2 Markdown 3.1.1 markdown-it-py 2.2.0 MarkupSafe 2.0.1 matplotlib 2.2.3 matplotlib-inline 0.1.6 mccabe 0.6.1 mdit-py-plugins 0.3.3 mdurl 0.1.1 mistune 0.8.4 more-itertools 7.2.0 moviepy 1.0.1 multidict 6.0.2 multiprocess 0.70.11.1 nbclassic 0.3.1 nbclient 0.5.13 nbconvert 6.4.4 nbformat 5.5.0 nest-asyncio 1.5.5 netifaces 0.10.9 networkx 2.4 nltk 3.4.5 nodeenv 1.3.4 notebook 5.7.0 numba 0.48.0 numpy 1.19.5 oauthlib 3.1.0 objgraph 3.4.1 opencv-python 4.6.0.66 openpyxl 3.0.5 opt-einsum 3.3.0 orjson 3.8.7 packaging 21.3 paddle-bfloat 0.1.7 paddle2onnx 1.0.0 paddlefsl 1.1.0 paddlehub 2.3.0 paddlenlp 2.4.2 paddlepaddle-gpu 2.4.0.post112 pamela 1.0.0 pandas 1.1.5 pandocfilters 1.5.0 parl 1.4.1 parso 0.7.1 pathlib 1.0.1 pexpect 4.7.0 pickleshare 0.7.5 Pillow 8.2.0 pip 22.1.2 pkgutil_resolve_name 1.3.10 plotly 5.8.0 pluggy 1.0.0 pre-commit 1.21.0 prettytable 0.7.2 proglog 0.1.9 prometheus-client 0.14.1 prompt-toolkit 2.0.10 protobuf 3.20.0 psutil 5.7.2 ptyprocess 0.7.0 py4j 0.10.9.2 pyarrow 10.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.7 pybboxes 0.1.1 pycodestyle 2.8.0 pycparser 2.21 pycryptodome 3.9.9 pydantic 1.10.6 pydeck 0.8.0 pydocstyle 5.0.2 pydub 0.25.1 pyflakes 2.4.0 pyglet 1.4.5 Pygments 2.13.0 pyhumps 3.8.0 pylint 2.5.2 Pympler 1.0.1 pynvml 8.0.4 pyOpenSSL 22.0.0 pyparsing 3.0.9 pypmml 0.9.11 pyrsistent 0.18.1 python-dateutil 2.8.2 python-json-logger 2.0.4 python-jsonrpc-server 0.3.4 python-language-server 0.33.0 python-lsp-jsonrpc 1.0.0 python-lsp-server 1.5.0 python-multipart 0.0.6 pytz 2019.3 pytz-deprecation-shim 0.1.0.post0 PyYAML 5.1.2 pyzmq 23.2.1 rarfile 3.1 recordio 0.1.7 requests 2.24.0 requests-oauthlib 1.3.0 resampy 0.2.2 responses 0.18.0 rfc3986 1.5.0 rich 12.6.0 rope 0.17.0 rsa 4.0 ruamel.yaml 0.17.21 ruamel.yaml.clib 0.2.6 sahi 0.10.1 scikit-learn 0.24.2 scipy 1.6.3 seaborn 0.10.0 semver 2.13.0 Send2Trash 1.8.0 sentencepiece 0.1.96 seqeval 1.2.2 setuptools 56.2.0 shapely 2.0.0 shellcheck-py 0.7.1.1 simplegeneric 0.8.1 six 1.16.0 sklearn 0.0 smmap 3.0.5 sniffio 1.3.0 snowballstemmer 2.0.0 SoundFile 0.10.3.post1 soupsieve 2.3.2.post1 SQLAlchemy 1.4.41 starlette 0.26.1 streamlit 1.13.0 streamlit-image-comparison 0.0.3 tabulate 0.8.3 tb-nightly 1.15.0a20190801 tb-paddle 0.3.6 tenacity 8.0.1 tensorboard 2.1.0 tensorboardX 1.8 termcolor 1.1.0 terminado 0.15.0 terminaltables 3.1.10 testpath 0.4.2 threadpoolctl 2.1.0 tinycss2 1.1.1 toml 0.10.0 toolz 0.12.0 tornado 6.2 tqdm 4.64.1 traitlets 5.4.0 typed-ast 1.4.1 typing_extensions 4.3.0 tzdata 2022.7 tzlocal 4.2 uc-micro-py 1.0.1 ujson 1.35 urllib3 1.25.11 uvicorn 0.21.1 validators 0.20.0 virtualenv 16.7.9 visualdl 2.4.0 watchdog 2.2.0 wcwidth 0.1.7 webencodings 0.5.1 websocket-client 1.4.1 websockets 10.4 Werkzeug 0.16.0 whatthepatch 1.0.2 wheel 0.36.2 widgetsnbextension 3.5.2 wrapt 1.12.1 xarray 0.16.2 xgboost 1.3.3 xlrd 1.2.0 xxhash 3.1.0 yapf 0.26.0 yarl 1.7.2 zipp 3.8.1 111

zouhan6806504 commented 1 year ago

如果我切换到dev版本,会有另外的错误

Traceback (most recent call last):
  File "/home/aistudio/train_attlstm_cnn_reconstruct.py", line 5, in <module>
    import paddlenlp
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/__init__.py", line 36, in <module>
    from . import trainer
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/trainer/__init__.py", line 21, in <module>
    from .trainer_compress import *
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/trainer/trainer_compress.py", line 27, in <module>
    from paddle.fluid.contrib.slim.quantization import PostTrainingQuantization
ModuleNotFoundError: No module named 'paddle.fluid.contrib.slim'
liudongxue01 commented 1 year ago

如果我切换到dev版本,会有另外的错误

Traceback (most recent call last):
  File "/home/aistudio/train_attlstm_cnn_reconstruct.py", line 5, in <module>
    import paddlenlp
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/__init__.py", line 36, in <module>
    from . import trainer
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/trainer/__init__.py", line 21, in <module>
    from .trainer_compress import *
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/trainer/trainer_compress.py", line 27, in <module>
    from paddle.fluid.contrib.slim.quantization import PostTrainingQuantization
ModuleNotFoundError: No module named 'paddle.fluid.contrib.slim'

你是将paddle更新到了dev版本,但没有更新paddlenlp是吧?此时可以重新安装一次paddlenlp吗? 你也可以如流加我liudongxue_135,我们先一一沟通,确认问题后,我们将解决方案post出来。 @zouhan6806504

zouhan6806504 commented 1 year ago

你是将paddle更新到了dev版本,但没有更新paddlenlp是吧?此时可以重新安装一次paddlenlp吗? 你也可以如流加我liudongxue_135,我们先一一沟通,确认问题后,我们将解决方案post出来。 @zouhan6806504

切换到dev并且升级了paddlenlp后,的确能运行了

liudongxue01 commented 1 year ago

好,有问题请联系我。