Closed anshoomehra closed 1 year ago
Please share the full logs of converting.
You can try disabling the multi-processing of the converting to get more info.
@byshiue appreciate your attention on this issue. There are no logs/errors being produced, script is run with --verbose. Please see the screenshots below. By disabling multi-processing, I guess you meant to set -p as 1, if not, please clarify and I will rerun it.
You can try to
@byshiue
How do I disable multi-process? Is there a flag for it? I do not see any. I already tried process = 1 not sure if that means multi-process disabled, the screen shots above demonstrates the same.
Which part of the code do you want me to add debug messages to? Are you looking for something specific which I can ensure is covered in the message log?
@byshiue
- How do I disable multi-process? Is there a flag for it? I do not see any. I already tried process = 1 not sure if that means multi-process disabled, the screen shots above demonstrates the same.
- Which part of the code do you want me to add debug messages to? Are you looking for something specific which I can ensure is covered in the message log?
No tag to disable the multi-process. You need to remove the calling of multi-process and call the convert function directly. Then, you can print some message like what keys/params are converted successfully or what fail.
@byshiue
While I debug, I noticed that this issue happens with https://huggingface.co/google/ul2 & its not specific to our fine-tuned model. Since this open source model, hopefully you can debug much faster at your end. My results from vanilla run & the Nvidia script I followed:
@byshiue
While I debug, I noticed that this issue happens with https://huggingface.co/google/ul2 & its not specific to our fine-tuned model. Since this open source model, hopefully you can debug much faster at your end. My results from vanilla run & the Nvidia script I followed:
I cannot reproduce the issue by the public ckpt. So, please share the end to end scripts to reproduce your issue on the public ckpt.
@byshiue
I do not have any custom code, I have simply run example script & followed steps 1, 2 :
Clone FastTransformer :
https://github.com/NVIDIA/FasterTransformer.git
Step 1 :
sudo apt-get install git-lfs git lfs install git lfs clone https://huggingface.co/google/ul2
Step 2 :
python3 FasterTransformer/examples/pytorch/t5/utils/huggingface_t5_ckpt_convert.py \ -saved_dir ul2/c-models \ -in_file ul2/ \ -inference_tensor_para_size 1 \ -weight_data_type fp32 \ -p 1 \ --verbose
I tried uploading the cloned version from GIT, however, that is 93 MB, and the system does not let me upload more than 25 MB or split zip files format. Let me know if you need the git cloned files and if at any cloud folder, I can upload the same?
This makes me wonder if you are not able to reproduce the issue following the above steps; could there be dependencies that may cause the encoder files to not be generated ??
@byshiue
I do not have any custom code, I have simply run example script & followed steps 1, 2 :
Clone FastTransformer :
https://github.com/NVIDIA/FasterTransformer.git
Step 1 :
sudo apt-get install git-lfs git lfs install git lfs clone https://huggingface.co/google/ul2
Step 2 :
python3 FasterTransformer/examples/pytorch/t5/utils/huggingface_t5_ckpt_convert.py \ -saved_dir ul2/c-models \ -in_file ul2/ \ -inference_tensor_para_size 1 \ -weight_data_type fp32 \ -p 1 \ --verbose
I tried uploading the cloned version from GIT, however, that is 93 MB, and the system does not let me upload more than 25 MB or split zip files format. Let me know if you need the git cloned files and if at any cloud folder, I can upload the same?
This makes me wonder if you are not able to reproduce the issue following the above steps; could there be dependencies that may cause the encoder files to not be generated ??
I have said, I cannot reproduce your issue by the scripts. So, please share your end to end scripts, including how you launch docker, how you install the transformers, and so on.
@byshiue
Step 1:Create a GCP-Vertex Image: Python 2, CUDA 11
Step 2: Clone FT https://github.com/NVIDIA/FasterTransformer.git
Step 3: Install dependencies recommended by FT pip install -r FasterTransformer/examples/pytorch/t5/requirement.txt
Step 4: Download UTL2 Model sudo apt-get install git-lfs git lfs install git lfs clone https://huggingface.co/google/ul2
Step 5: Convert the checkpoint to FT python3 ../examples/pytorch/t5/utils/huggingface_t5_ckpt_convert.py \ -saved_dir ul2/c-models \ -in_file ul2/ \ -inference_tensor_para_size 2 \ -weight_data_type fp32
Step 6: Above step fails with below error: symbol cublasLtHSHMatmulAlgoInit version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
Fix above error by doing pip uninstall nvidia_cublas_cu1
Step 7: Rerun Step 5, which completes producing decoding files and missing encoder files
All Libs & Versions
(base) jupyter@ul2-transform:~/ul2-transform$ pip freeze absl-py==1.4.0 aiohttp==3.8.4 aiohttp-cors==0.7.0 aiorwlock==1.3.0 aiosignal==1.3.1 ansiwrap==0.8.4 antlr4-python3-runtime==4.8 anyio @ file:///home/conda/feedstock_root/build_artifacts/anyio_1666191106763/work/dist argon2-cffi @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi_1640817743617/work argon2-cffi-bindings @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi-bindings_1649500320262/work async-timeout==4.0.2 asynctest==0.13.0 attrs @ file:///home/conda/feedstock_root/build_artifacts/attrs_1671632566681/work Babel==2.12.1 backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work backoff==2.2.1 backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1618230623929/work beatrix-jupyterlab @ file:///home/kbuilder/miniconda3/conda-bld/dlenv-base_1681181343956/work/packages/beatrix_jupyterlab-2023.46.184821.tar.gz beautifulsoup4 @ file:///home/conda/feedstock_root/build_artifacts/beautifulsoup4_1680888073205/work bleach @ file:///home/conda/feedstock_root/build_artifacts/bleach_1674535352125/work blessed==1.20.0 brotlipy @ file:///home/conda/feedstock_root/build_artifacts/brotlipy_1648854164153/work cachetools==5.3.0 certifi==2022.12.7 cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1666183775483/work charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1678108872112/work click==8.1.3 cloud-tpu-client==0.10 cloudpickle==2.2.1 colorama==0.4.6 colorful==0.5.5 conda==22.9.0 conda-package-handling @ file:///home/conda/feedstock_root/build_artifacts/conda-package-handling_1669907009957/work conda_package_streaming @ file:///home/conda/feedstock_root/build_artifacts/conda-package-streaming_1669733752472/work cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography_1666563371538/work cycler==0.11.0 Cython==0.29.34 datasets==2.3.2 db-dtypes==1.1.1 debugpy==1.6.7 decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work defusedxml @ file:///home/conda/feedstock_root/build_artifacts/defusedxml_1615232257335/work Deprecated==1.2.13 dill==0.3.5.1 distlib==0.3.6 dm-tree==0.1.8 docker==6.0.1 entrypoints @ file:///home/conda/feedstock_root/build_artifacts/entrypoints_1643888246732/work fastapi==0.95.0 fastjsonschema @ file:///home/conda/feedstock_root/build_artifacts/python-fastjsonschema_1677336799617/work/dist filelock==3.11.0 flit_core @ file:///home/conda/feedstock_root/build_artifacts/flit-core_1667734568827/work/source/flit_core fonttools==4.38.0 frozenlist==1.3.3 fsspec==2023.1.0 gcsfs==2023.1.0 gitdb==4.0.10 GitPython==3.1.31 google-api-core==1.34.0 google-api-python-client==1.8.0 google-auth==2.17.2 google-auth-httplib2==0.1.0 google-auth-oauthlib==1.0.0 google-cloud-aiplatform==1.23.0 google-cloud-artifact-registry==1.8.1 google-cloud-bigquery==3.9.0 google-cloud-bigquery-storage==2.19.1 google-cloud-core==2.3.2 google-cloud-datastore==1.15.5 google-cloud-language==2.9.1 google-cloud-monitoring==2.14.2 google-cloud-resource-manager==1.9.1 google-cloud-storage==2.8.0 google-crc32c==1.5.0 google-resumable-media==2.4.1 googleapis-common-protos==1.59.0 gpustat==1.0.0 greenlet==2.0.2 grpc-google-iam-v1==0.12.6 grpcio==1.53.0 grpcio-status==1.48.2 Gymnasium==0.26.3 gymnasium-notices==0.0.1 h11==0.14.0 htmlmin==0.1.12 httplib2==0.22.0 huggingface-hub==0.14.1 idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work ImageHash==4.3.1 imageio==2.27.0 importlib-metadata==6.0.1 importlib-resources @ file:///home/conda/feedstock_root/build_artifacts/importlib_resources_1676919000169/work ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1666723258080/work ipython==7.34.0 ipython-genutils==0.2.0 ipython-sql==0.5.0 ipywidgets==8.0.6 jaraco.classes==3.2.3 jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1669134318875/work jeepney==0.8.0 Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/jinja2_1654302431367/work joblib==1.2.0 json5==0.9.11 jsonschema @ file:///home/conda/feedstock_root/build_artifacts/jsonschema-meta_1669810440410/work jupyter-http-over-ws==0.0.8 jupyter-server==1.23.6 jupyter-server-mathjax==0.2.6 jupyter-server-proxy==3.2.2 jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1673615989977/work jupyter_core==4.12.0 jupyterlab==3.4.8 jupyterlab-git==0.41.0 jupyterlab-pygments @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_pygments_1649936611996/work jupyterlab-widgets==3.0.7 jupyterlab_server==2.22.0 jupytext==1.14.5 keyring==23.13.1 keyrings.google-artifactregistry-auth==1.1.2 kiwisolver==1.4.4 kubernetes==26.1.0 llvmlite==0.39.1 lz4==4.3.2 markdown-it-py==2.2.0 MarkupSafe==2.1.2 matplotlib==3.5.3 matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1660814786464/work mdit-py-plugins==0.3.5 mdurl==0.1.2 mistune @ file:///home/conda/feedstock_root/build_artifacts/mistune_1675771498296/work more-itertools==9.1.0 msgpack==1.0.5 multidict==6.0.4 multimethod==1.9.1 multiprocess==0.70.13 nb-conda @ file:///home/conda/feedstock_root/build_artifacts/nb_conda_1654442778977/work nb-conda-kernels @ file:///home/conda/feedstock_root/build_artifacts/nb_conda_kernels_1636999991206/work nbclassic @ file:///home/conda/feedstock_root/build_artifacts/nbclassic_1680699279518/work nbclient==0.7.3 nbconvert @ file:///home/conda/feedstock_root/build_artifacts/nbconvert-meta_1681137024412/work nbdime==3.1.1 nbformat @ file:///home/conda/feedstock_root/build_artifacts/nbformat_1679336765223/work nest-asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1664684991461/work networkx==2.6.3 nltk==3.8.1 notebook @ file:///home/conda/feedstock_root/build_artifacts/notebook_1680870634737/work notebook-executor @ file:///home/kbuilder/miniconda3/conda-bld/dlenv-base_1681181343956/work/packages/notebook_executor notebook_shim @ file:///home/conda/feedstock_root/build_artifacts/notebook-shim_1667478401171/work numba==0.56.4 numpy==1.21.6 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-ml-py==11.495.46 oauth2client==4.1.3 oauthlib==3.2.2 omegaconf==2.1.2 opencensus==0.11.2 opencensus-context==0.1.3 opentelemetry-api==1.17.0 opentelemetry-exporter-otlp==1.17.0 opentelemetry-exporter-otlp-proto-grpc==1.17.0 opentelemetry-exporter-otlp-proto-http==1.17.0 opentelemetry-proto==1.17.0 opentelemetry-sdk==1.17.0 opentelemetry-semantic-conventions==0.38b0 packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1673482170163/work pandas==1.3.5 pandas-profiling==3.6.6 pandocfilters @ file:///home/conda/feedstock_root/build_artifacts/pandocfilters_1631603243851/work papermill==2.4.0 parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work patsy==0.5.3 pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1667297516076/work phik==0.12.3 pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work Pillow==9.5.0 pkgutil_resolve_name @ file:///home/conda/feedstock_root/build_artifacts/pkgutil-resolve-name_1633981968097/work platformdirs==3.2.0 plotly==5.14.1 pluggy==1.0.0 portalocker==2.7.0 prettytable==3.7.0 prometheus-client @ file:///home/conda/feedstock_root/build_artifacts/prometheus_client_1674535637125/work prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1677600924538/work proto-plus==1.22.2 protobuf==3.20.3 psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1666155398032/work ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl py-spy==0.3.14 pyarrow==11.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycosat @ file:///home/conda/feedstock_root/build_artifacts/pycosat_1666656960991/work pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work pydantic==1.10.7 Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1681142969746/work PyJWT==2.6.0 pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1680037383858/work pyparsing==3.0.9 pyrsistent==0.19.3 PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1648857264451/work python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work pytz==2023.3 PyWavelets==1.3.0 PyYAML==6.0 pyzmq==25.0.2 ray==2.3.1 ray-cpp==2.3.1 regex==2022.10.31 requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1680286922386/work requests-oauthlib==1.3.1 responses==0.18.0 retrying==1.3.4 rich==13.3.3 rouge-score==0.1.2 rsa==4.9 ruamel-yaml-conda @ file:///home/conda/feedstock_root/build_artifacts/ruamel_yaml_1653464404698/work sacrebleu==2.1.0 scikit-image==0.19.3 scikit-learn==1.0.2 scipy==1.7.3 seaborn==0.12.2 SecretStorage==3.3.3 Send2Trash @ file:///home/conda/feedstock_root/build_artifacts/send2trash_1628511208346/work sentencepiece==0.1.98 Shapely==1.8.5.post1 simpervisor==0.4 six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work smart-open==6.3.0 smmap==5.0.0 sniffio @ file:///home/conda/feedstock_root/build_artifacts/sniffio_1662051266223/work soupsieve==2.4 SQLAlchemy==2.0.9 sqlparse==0.4.3 starlette==0.26.1 statsmodels==0.13.5 tabulate==0.9.0 tangled-up-in-unicode==0.2.0 tenacity==8.2.2 tensorboardX==2.6 terminado @ file:///home/conda/feedstock_root/build_artifacts/terminado_1670253674810/work textwrap3==0.9.2 threadpoolctl==3.1.0 tifffile==2021.11.2 tinycss2 @ file:///home/conda/feedstock_root/build_artifacts/tinycss2_1666100256010/work tokenizers==0.12.1 toml==0.10.2 tomli==2.0.1 toolz @ file:///home/conda/feedstock_root/build_artifacts/toolz_1657485559105/work torch==1.13.1 tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1656937818679/work tqdm==4.64.1 traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1675110562325/work transformers==4.20.1 typeguard==2.13.3 typer==0.7.0 typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1678559861143/work uritemplate==3.0.1 urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1678635778344/work uvicorn==0.21.1 virtualenv==20.21.0 visions==0.7.5 wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1673864653149/work webencodings==0.5.1 websocket-client @ file:///home/conda/feedstock_root/build_artifacts/websocket-client_1675567828044/work widgetsnbextension==4.0.7 wrapt==1.15.0 xxhash==3.2.0 yarl==1.8.2 ydata-profiling==4.1.2 zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1677313463193/work zstandard @ file:///home/conda/feedstock_root/build_artifacts/zstandard_1655887611100/work
If you want any specific log messages, please update the script FasterTransformer/examples/pytorch/t5/utils/huggingface_t5_ckpt_convert.py (and/or any other files) add whatever log messages you want to see. Send it across & I can run and share the logs.
So, you don't use the docker image but you claim you follow the scripts of the guide.
Can you try following the document step by step, including the docker image.
@byshiue I did not claim anything as you mentioned & using docker is not mandatory.
At this stage, I have provided everything to Nvidia to debug this issue & unfortunately, we are going nowhere!
We were excited about FastTransformers but it does not look like we can benefit from it at the moment - hope your team in the future makes this capability easier to consume & bug free boosting adoption.
Best
@anshoomehra An interesting thing is, other two issues https://github.com/NVIDIA/FasterTransformer/issues/307 and https://github.com/NVIDIA/FasterTransformer/issues/554 have tested the Flan-UL2 without any issue on converter. Why you are so confident that the issue is not caused by docker image and environment?
Using docker is of course not mandatory. But we are finding the reason and why not try the docker image first, and it successes, we can have more clues about the reason.
Branch/Tag/Commit
latest
Docker Image Version
NA
GPU name
A100
CUDA Driver
cu116
Reproduced Steps