CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
MIT License
4.48k stars 471 forks source link

When set the tracker to tensorboard, the following error happened. #437

Closed cdxzyc closed 1 year ago

cdxzyc commented 1 year ago

🐛 Describe the bug

Traceback (most recent call last): File "/ossfs/node_29637239/workspace/ppo_nraq_t5_retrieval_augmention.py", line 209, in main() File "/ossfs/node_29637239/workspace/ppo_nraq_t5_retrieval_augmention.py", line 197, in main trlx.train( File "/ossfs/node_29637239/workspace/trlx-main/trlx/trlx.py", line 89, in train trainer = get_trainer(config.train.trainer)( File "/ossfs/node_29637239/workspace/trlx-main/trlx/trainer/accelerate_ppo_trainer_with_knowledge.py", line 50, in init super().init(config, kwargs) File "/ossfs/node_29637239/workspace/trlx-main/trlx/trainer/accelerate_ppo_trainer.py", line 46, in init super().init(config, kwargs) File "/ossfs/node_29637239/workspace/trlx-main/trlx/trainer/accelerate_base_trainer.py", line 120, in init self.accelerator.init_trackers( File "/root/miniconda3/lib/python3.9/site-packages/accelerate/accelerator.py", line 548, in _inner return PartialState().on_main_process(function)(*args, *kwargs) File "/root/miniconda3/lib/python3.9/site-packages/accelerate/accelerator.py", line 2037, in init_trackers tracker.store_init_configuration(config) File "/root/miniconda3/lib/python3.9/site-packages/accelerate/tracking.py", line 83, in execute_on_main_process return PartialState().on_main_process(function)(self, args, **kwargs) File "/root/miniconda3/lib/python3.9/site-packages/accelerate/tracking.py", line 211, in store_init_configuration self.writer.add_hparams(values, metric_dict={}) File "/root/miniconda3/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 336, in add_hparams exp, ssi, sei = hparams(hparam_dict, metric_dict, hparam_domain_discrete) File "/root/miniconda3/lib/python3.9/site-packages/torch/utils/tensorboard/summary.py", line 228, in hparams raise ValueError( ValueError: value should be one of int, float, str, bool, or torch.Tensor

Which trlX version are you using?

trlx==0.6.0

Additional system and package information

absl-py==1.4.0 accelerate==0.18.0 adabench==1.2.36 aii-pypai==0.1.40.33 aiofiles==22.1.0 aiohttp==3.8.4 aiosignal==1.3.1 aiosqlite==0.18.0 aistudio-analyzer==0.0.4.87 aistudio-common==0.0.28.31 aistudio-notebook==2.0.101 alabaster==0.7.13 albumentations==1.3.0 aliyun-log-python-sdk==0.8.6 aliyun-python-sdk-core==2.13.36 aliyun-python-sdk-kms==2.16.0 alps==2.3.0.6 ant-couler==0.0.1rc8 antflake8==0.1.4 anyio==3.6.2 apex @ http://cmps-model.cn-hangzhou.alipay.aliyun-inc.com/264991/apex/torch2.0.0-cuda11.7/2303/apex-0.1-cp39-cp39-linux_x86_64.whl#sha256=e1f31ff06ecc8ca38b8c6dac5d810951e8c44166fcfd69f5aad9fe8a35d76482 appdirs==1.4.4 argo-workflows==3.5.1 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arrow==1.2.3 astor==0.8.1 astroid==2.15.2 asttokens==2.2.1 astunparse==1.6.3 async-timeout==4.0.2 atorch==0.1.5 attrs==23.1.0 audioread==3.0.0 Automat==22.10.0 autopep8==2.0.2 Babel==2.12.1 backcall==0.2.0 bce-python-sdk==0.8.83 bcrypt==4.0.1 beautifulsoup4==4.12.2 beautifultable==1.1.0 bertopic==0.14.1 bitsandbytes==0.38.1 bleach==6.0.0 blessed==1.20.0 blis==0.7.9 blosc2==2.0.0 bokeh==3.1.0 boltons @ file:///home/conda/feedstock_root/build_artifacts/boltons_1677499911949/work brotlipy @ file:///home/conda/feedstock_root/build_artifacts/brotlipy_1666764672617/work cachetools==5.3.0 captum==0.6.0 catalogue==2.0.8 catboost==1.1.1 cattrs==22.2.0 certifi==2022.12.7 cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1671179360775/work cfgv==3.3.1 charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1678108872112/work click==8.1.3 click-config-file==0.6.0 cloudpickle==2.2.1 cmake==3.26.3 colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1666700638685/work coloredlogs==15.0.1 colorlog==6.7.0 colossalai @ http://cmps-model.cn-hangzhou.alipay.aliyun-inc.com/264991/colossalai/torch2.0.0-cuda11.7/colossalai-0.2.8-cp39-cp39-linux_x86_64.whl#sha256=3e8c951df8d768aac86f6997418326c2153159db75b05fe77a7470b2e40a3e75 comm==0.1.3 conda==23.3.1 conda-package-handling @ file:///home/conda/feedstock_root/build_artifacts/conda-package-handling_1669907009957/work conda_package_streaming @ file:///home/conda/feedstock_root/build_artifacts/conda-package-streaming_1669733752472/work confection==0.0.4 configobj==5.0.8 configparser==5.3.0 constantly==15.1.0 contexttimer==0.3.3 contourpy==1.0.7 couler-core==0.1.1rc8 cpca==0.5.5 cramjam==2.6.2 crc32c==2.3.post0 crcmod==1.7 croniter==1.3.14 cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography-split_1681508587436/work cssselect==1.2.0 cubinlinker-cu11==0.3.0.post1 cuda-python==11.8.1 cudf-cu11==23.4.0.1681363056 cugraph-cu11==23.4.0.1681369743 cuml-cu11==23.4.0.1681368248 cupy-cuda11x==11.6.0 cycler==0.11.0 cymem==2.0.7 Cython @ file:///home/conda/feedstock_root/build_artifacts/cython_1680712331760/work dask==2023.4.0 dask-cuda==23.4.0 dask-cudf-cu11==23.4.0.1681364034 dataclasses==0.6 datasets==2.11.0 dateparser==1.1.8 dateutils==0.6.12 debugpy==1.6.7 decorator==4.4.2 deepdiff==6.3.0 deepspeed @ http://cmps-model.cn-hangzhou.alipay.aliyun-inc.com/264991/deepspeed/torch2.0.0-cuda11.7/deepspeed-0.9.0-cp39-cp39-linux_x86_64.whl#sha256=9fc6ab28acd7a108c294d979036a6a9a793f13eb9094af74d7897607c0c2731b defusedxml==0.7.1 Deprecated==1.2.13 deprecation==2.1.0 dgl @ http://cmps-model.cn-hangzhou.alipay.aliyun-inc.com/264991/dgl/torch2.0.0-cuda11.7/dgl-1.0.2-cp39-cp39-linux_x86_64.whl#sha256=5bb61bb01fd5c50e3ee3f644a79fbfce91527b9ef1fcda7a49a626c7a215172c dglgo==0.0.2 diffusers==0.15.0 dill==0.3.4 distlib==0.3.6 distributed==2023.4.0 docker==4.1.0 docker-pycreds==0.4.0 docstring-parser==0.15 docstring-to-markdown==0.12 docutils==0.19 dowhy==0.7 easydl-sdk==0.0.3 einops==0.6.0 elastic-transport==8.4.0 elasticai-api==1.6.0 elasticsearch==8.7.0 energonai @ http://cmps-model.cn-hangzhou.alipay.aliyun-inc.com/264991/energonai/torch2.0.0-cuda11.7/energonai-0.0.1%2Btorch2.0cu11.7-cp39-cp39-linux_x86_64.whl#sha256=581aa7a9ce59ce3b300de0907cfccb4d7c80981f891bb83d261d3ac575be0036 entrypoints==0.3 et-xmlfile==1.1.0 evaluate==0.4.0 exceptiongroup==1.1.1 executing==1.2.0 fabric==3.0.0 fairscale==0.4.1 faiss-cpu==1.7.3 fastai==2.7.12 fastapi==0.88.0 fastBPE==0.1.0 fastcore==1.5.29 fastdownload==0.0.7 fastjsonschema==2.16.3 fastparquet==2023.2.0 fastprogress==1.0.3 fastrlock==0.8.1 fe==0.3.31 ffmpeg-python==0.2.0 filelock==3.11.0 flake8==6.0.0 flash-attn @ http://cmps-model.cn-hangzhou.alipay.aliyun-inc.com/264991/flash-attn/torch2.0.0-cuda11.7/flash_attn-0.2.8-cp39-cp39-linux_x86_64.whl#sha256=671299b9fcfde9e49aea1bafe007c62b00a0ce515b1e214302bc6dee9a10955d Flask==2.2.3 Flask-Babel==2.0.0 Flask-Cors==3.0.10 flatbuffers==23.3.3 fonttools==4.39.3 fqdn==1.5.1 frozenlist==1.3.3 fsspec==2023.4.0 func-timeout==4.3.5 future==0.18.3 gast==0.4.0 gensim==4.3.1 gitdb==4.0.10 GitPython==3.1.31 google-auth==2.17.3 google-auth-oauthlib==1.0.0 google-pasta==0.2.0 gpustat==1.1 graphviz==0.20.1 greenlet==2.0.2 grpcio==1.53.0 grpcio-tools==1.34.1 gym==0.10.9 h11==0.14.0 h5py==3.8.0 hdbscan==0.8.29 HeapDict==1.0.1 hjson==3.1.0 huggingface-hub==0.13.4 humanfriendly==10.0 hyperlink==21.0.0 identify==2.5.22 idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work imageio==2.27.0 imageio-ffmpeg==0.4.8 imagesize==1.4.1 importlib-metadata==6.4.1 importlib-resources==5.12.0 incremental==22.10.0 inquirer==3.1.3 invoke==2.0.0 IProgress==0.4 ipykernel==6.22.0 ipython==8.12.0 ipython-genutils==0.2.0 ipywidgets==8.0.6 isodate==0.6.1 isoduration==20.11.0 isort==5.12.0 itemadapter==0.8.0 itemloaders==1.0.6 iterative-stratification==0.1.7 itsdangerous==2.1.2 jaraco.classes==3.2.3 jax==0.4.8 jaxlib==0.4.7+cuda11.cudnn86 jedi==0.18.2 jedi-language-server==0.40.0 jeepney==0.8.0 jieba==0.42.1 Jinja2==3.0.3 jinjasql==0.1.8 jmespath==0.10.0 joblib==1.2.0 json5==0.9.11 jsonargparse==4.9.0 jsonpatch @ file:///home/conda/feedstock_root/build_artifacts/jsonpatch_1632759296524/work jsonpath-ng==1.5.3 jsonpointer==2.0 jsonschema==4.17.3 jupyter-contrib-core==0.4.2 jupyter-contrib-nbextensions==0.7.0 jupyter-events==0.6.3 jupyter-highlight-selected-word==0.2.0 jupyter-lsp==2.0.1 jupyter-nbextensions-configurator==0.6.1 jupyter-server==1.24.0 jupyter-ydoc==0.2.4 jupyter_client==8.2.0 jupyter_core==5.3.0 jupyter_server_fileid==0.9.0 jupyter_server_terminals==0.4.4 jupyter_server_ydoc==0.8.0 jupyterlab==3.6.3 jupyterlab-lsp==4.0.1 jupyterlab-pygments==0.2.2 jupyterlab-widgets==3.0.7 jupyterlab_server==2.22.1 keras==2.12.0 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.2 keyring==23.13.1 kiwisolver==1.4.4 kubernetes==26.1.0 langcodes==3.3.0 lazy-object-proxy==1.9.0 lazy_loader==0.2 libclang==16.0.0 libmambapy @ file:///home/conda/feedstock_root/build_artifacts/mamba-split_1680791035685/work/libmambapy librosa==0.10.0.post2 lightgbm==3.3.5 lightning==2.0.1.post0 lightning-cloud==0.5.33 lightning-flash==0.8.1.post0 lightning-utilities==0.8.0 lit==16.0.1 littleutils==0.2.2 llvmlite==0.39.1 locket==1.0.0 loguru==0.7.0 lsprotocol==2023.0.0a1 lxml==4.9.2 lz4==4.3.2 mamba @ file:///home/conda/feedstock_root/build_artifacts/mamba-split_1680791035685/work/mamba Markdown==3.4.3 markdown-it-py==2.2.0 MarkupSafe==2.1.2 marshmallow==3.19.0 matplotlib==3.7.1 matplotlib-inline==0.1.6 mccabe==0.7.0 mdurl==0.1.2 miditoolkit==0.1.16 mido==1.2.10 mistune==2.0.5 ml-dtypes==0.1.0 mock==5.0.1 more-itertools==9.1.0 moviepy==1.0.3 mpmath==1.3.0 msgpack==1.0.5 multidict==6.0.4 multiprocess==0.70.12.2 murmurhash==1.0.9 murmurhash2==0.2.10 mypy-extensions==1.0.0 nbclassic==0.5.5 nbclient==0.7.3 nbconvert==7.3.1 nbformat==5.8.0 nest-asyncio==1.5.6 networkx==3.1 ninja==1.11.1 nltk==3.8.1 nn-pruning==0.1.2 nodeenv==1.7.0 nose==1.3.7 notebook==6.5.4 notebook_shim==0.2.2 numba==0.56.4 numexpr==2.8.4 numpy==1.23.5 numpydoc==1.5.0 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-ml-py==11.525.112 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 nvitop==1.1.2 nvtx==0.2.5 oauthlib==3.2.2 objgraph==3.5.0 odps==3.5.1 ogb==1.3.6 onnx==1.13.1 onnx-simplifier==0.4.19 onnxconverter-common==1.13.0 onnxoptimizer==0.3.10 onnxruntime-gpu==1.14.1 openai==0.27.4 OpenCC==1.1.6 opencv-contrib-python-headless==4.7.0.72 opencv-python-headless==4.7.0.72 openpyxl==3.1.2 opt-einsum==3.3.0 optimum==1.7.3 ordered-set==4.1.0 oss2==2.17.0 outdated==0.2.2 overrides==3.1.0 packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1681337016113/work paddle-bfloat==0.1.7 paddle2onnx==1.0.6 paddlefsl==1.1.0 paddlenlp==2.5.2 paddlepaddle-gpu==2.4.2.post112 pandas==2.0.0 pandocfilters==1.5.0 paramiko==3.1.0 parsel==1.7.0 parso==0.8.3 partd==1.4.0 pathtools==0.1.2 pathy==0.10.1 patsy==0.5.3 peft==0.2.0 peppercorn==0.6 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.5.0 pkginfo==1.9.6 platformdirs==3.2.0 plotly==5.14.1 pluggy @ file:///home/conda/feedstock_root/build_artifacts/pluggy_1667232663820/work ply==3.11 pooch==1.6.0 portalocker==2.7.0 pre-commit==3.2.2 preshed==3.0.8 pretty-midi==0.2.10 prettytable==3.7.0 proglog==0.1.10 prometheus-client==0.16.0 prompt-toolkit==3.0.38 Protego==0.2.1 protobuf==3.20.3 psutil==5.9.4 PTable==0.9.2 ptxcompiler-cu11==0.7.0.post1 ptyprocess==0.7.0 pure-eval==0.2.2 py==1.11.0 py-cpuinfo==9.0.0 py-midi==2.0.1 pyahocorasick==2.0.0 pyaml==21.10.1 pyarrow==11.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycodestyle==2.10.0 pycosat @ file:///home/conda/feedstock_root/build_artifacts/pycosat_1666836642684/work pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work pycryptodome==3.17 pydantic==1.10.7 pyDeprecate==0.3.2 pyDes==2.0.1 PyDispatcher==2.0.7 pydot==1.4.2 pyflakes==3.0.1 pyglet==2.0.5 pygls==1.0.1 Pygments==2.15.0 pyhocon==0.3.60 pyinotify==0.9.6 PyJWT==2.6.0 pylibcugraph-cu11==23.4.0.1681368249 pylibraft-cu11==23.4.0.1681363053 pylint==2.17.2 PyMySQL==1.0.3 PyNaCl==1.5.0 pynndescent==0.5.8 pynvml==11.5.0 pyodps==0.11.3.1 Pyomo==6.5.0 pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1680037383858/work pyparsing==3.0.9 pypianoroll==1.0.4 pypinyin==0.48.0 pyre-extensions==0.0.23 pyrsistent==0.19.3 pyserial==3.5 PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work python-dateutil==2.8.2 python-editor==1.0.4 python-gitlab==1.4.0 python-json-logger==2.0.7 python-multipart==0.0.6 python-rapidjson==1.10 pytorch-lightning==2.0.1.post0 pytz==2023.3 pytz-deprecation-shim==0.1.0.post0 PyWavelets==1.4.1 pyworld==0.3.2 PyYAML==6.0 pyzmq==25.0.2 qudida==0.0.4 queuelib==1.6.2 raft-dask-cu11==23.4.0.1681367712 ray @ http://antyimiaobucket.oss-cn-hangzhou-zmf.aliyuncs.com/libraries/ray-3.0.0.dev0-cp39-cp39-manylinux2014_x86_64.whl#sha256=5863b54e4e342d2f7f98d7e507063f2af1d2f38fd69b69c2527d37077a6d074d rdkit-pypi==2022.9.5 readchar==4.0.5 readme-renderer==37.3 redis==3.5.3 regex==2023.3.23 requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1680286922386/work requests-file==1.5.1 requests-oauthlib==1.3.1 requests-toolbelt==0.10.1 responses==0.18.0 retry==0.9.2 retrying==1.3.4 rfc3339-validator==0.1.4 rfc3986==2.0.0 rfc3986-validator==0.1.1 rich==13.3.4 rjieba==0.1.11 rmm-cu11==23.4.0.1681362075 roformer==0.4.3 rsa==4.9 ruamel.yaml @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml_1678272973380/work ruamel.yaml.clib @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml.clib_1670412733608/work safetensors==0.3.0 scikit-image==0.20.0 scikit-learn==1.2.2 scipy==1.9.1 Scrapy==2.8.0 seaborn==0.12.2 SecretStorage==3.3.3 Send2Trash==1.8.0 sentence-transformers==2.2.2 sentencepiece==0.1.98 sentry-sdk==1.19.1 seqeval==1.2.2 service-identity==21.1.0 setfit==0.7.0 setproctitle==1.3.2 simcse==0.4 six==1.16.0 skorch==0.12.1 smart-open==6.3.0 smmap==5.0.0 sniffio==1.3.0 snowballstemmer==2.2.0 sortedcontainers==2.4.0 soundfile==0.12.1 soupsieve==2.4 soxr==0.3.5 spacy==3.5.2 spacy-legacy==3.0.12 spacy-loggers==1.0.4 Sphinx==6.1.3 sphinxcontrib-applehelp==1.0.4 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==2.0.1 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.5 SQLAlchemy==2.0.9 sqlparse==0.4.3 srsly==2.4.6 stack-data==0.6.2 starlette==0.22.0 starsessions==1.3.0 statsmodels==0.13.5 stringcase==1.2.0 sympy==1.11.1 tables==3.8.0 tabulate==0.9.0 tbase==0.2.15 tblib==1.7.0 tenacity==8.2.2 tensorboard==2.12.2 tensorboard-data-server==0.7.0 tensorboard-plugin-wit==1.8.1 tensorflow==2.12.0 tensorflow-estimator==2.12.0 tensorflow-io==0.32.0 tensorflow-io-gcs-filesystem==0.32.0 termcolor==2.2.0 terminado==0.17.1 tfplus-pangu==0.2.12 tfrecord==1.14.1 thinc==8.1.9 threadpoolctl==3.1.0 tifffile==2023.4.12 timm==0.6.13 tinycss2==1.2.1 tldextract==3.4.0 tokenizers==0.13.3 tomli==2.0.1 tomlkit==0.11.7 toolz @ file:///home/conda/feedstock_root/build_artifacts/toolz_1657485559105/work torch==2.0.0 torch-cluster==1.6.1+pt20cu117 torch-geometric==2.3.0 torch-scatter==2.1.1+pt20cu117 torch-sparse==0.6.17+pt20cu117 torch-spline-conv==1.2.2+pt20cu117 torch-tb-profiler==0.4.1 torchaudio==2.0.1+cu117 torchdata==0.6.0 torchmetrics==0.10.3 torchtext==0.15.1+cpu torchtyping==0.1.4 torchvision==0.15.1+cu117 tornado==6.2 tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1677948868469/work traitlets==5.9.0 transformers==4.28.1 treelite==3.2.0 treelite-runtime==3.2.0 triton==2.0.0 tritonclient==2.32.0 trl==0.4.1 # Editable Git install with no remote (trlx==0.6.0) -e /ossfs/node_29637239/workspace/trlx-main twine==4.0.2 Twisted==22.10.0 typeguard==2.13.3 typer==0.7.0 typing-inspect==0.8.0 typing_extensions==4.5.0 tzdata==2023.3 tzlocal==4.3 ucx-py-cu11==0.31.0.1681362077 umap-learn==0.5.3 Unidecode==1.3.6 uri-template==1.2.0 urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1678635778344/work uvicorn==0.21.1 virtualenv==20.21.0 visualdl==2.4.2 w3lib==2.1.1 wandb==0.14.2 wasabi==1.1.1 watchdog==2.3.1 wcwidth==0.2.6 webcolors==1.13 webencodings==0.5.1 websocket-client==1.5.1 websockets==11.0.1 Werkzeug==2.2.3 wfbuilder==1.0.56.30 wget==3.2 widgetsnbextension==4.0.7 wrapt==1.14.1 xattr==0.10.1 xformers==0.0.18 xgboost==1.7.5 xlrd==2.0.1 XlsxWriter==3.1.0 xxhash==3.2.0 xyzservices==2023.2.0 y-py==0.5.9 yapf==0.32.0 yarl==1.8.2 ypy-websocket==0.8.2 zeep==4.2.1 zhenjin-utils==0.0.1.32 zict==2.2.0 zipp==3.15.0 zope.interface==6.0 zstandard==0.19.0

ps602 commented 1 year ago

faced same issue, have you fixed?

maxreciprocate commented 1 year ago

Should be fixed now

maxreciprocate commented 1 year ago

Resolved with https://github.com/CarperAI/trlx/pull/444