Converting Flan-UL2 does not produce encoder files

anshoomehra commented 1 year ago

Branch/Tag/Commit

latest

Docker Image Version

NA

GPU name

A100

CUDA Driver

cu116

Reproduced Steps

Running example script to convert flan-ul2 model. The script finishes with no errors. However, it only produces the below files & missing encoder* files. Inference fails to check encoder files:

Script Ran:
python3 FasterTransformer/examples/pytorch/t5/utils/huggingface_t5_ckpt_convert.py \
        -saved_dir nqg/c-models \
        -in_file nqg/ \
        -inference_tensor_para_size 1 \
        -weight_data_type bf16 \
        -p 4 \
        --verbose

Files Type Produced:

1. decoder-block*
2. decoder-final*
3. lm_head*
4. config.ini

Error Message at the time of inference:

FileNotFoundError: [Errno 2] No such file or directory: 
'../../../../nqg/c-models/1-gpu/encoder.block.0.layer.0.layer_norm.weight.bin'

byshiue commented 1 year ago

Please share the full logs of converting.

byshiue commented 1 year ago

You can try disabling the multi-processing of the converting to get more info.

anshoomehra commented 1 year ago

@byshiue appreciate your attention on this issue. There are no logs/errors being produced, script is run with --verbose. Please see the screenshots below. By disabling multi-processing, I guess you meant to set -p as 1, if not, please clarify and I will rerun it.

byshiue commented 1 year ago

You can try to

add some debug messages
Disable the multi-process first because it may hide some errors.

anshoomehra commented 1 year ago

@byshiue

How do I disable multi-process? Is there a flag for it? I do not see any. I already tried process = 1 not sure if that means multi-process disabled, the screen shots above demonstrates the same.
Which part of the code do you want me to add debug messages to? Are you looking for something specific which I can ensure is covered in the message log?

byshiue commented 1 year ago

@byshiue

How do I disable multi-process? Is there a flag for it? I do not see any. I already tried process = 1 not sure if that means multi-process disabled, the screen shots above demonstrates the same.

Which part of the code do you want me to add debug messages to? Are you looking for something specific which I can ensure is covered in the message log?

No tag to disable the multi-process. You need to remove the calling of multi-process and call the convert function directly. Then, you can print some message like what keys/params are converted successfully or what fail.

anshoomehra commented 1 year ago

@byshiue

While I debug, I noticed that this issue happens with https://huggingface.co/google/ul2 & its not specific to our fine-tuned model. Since this open source model, hopefully you can debug much faster at your end. My results from vanilla run & the Nvidia script I followed:

https://github.com/NVIDIA/FasterTransformer/blob/main/docs/t5_guide.md#running-ul2-on-fastertransformer-pytorch-op

byshiue commented 1 year ago

@byshiue

While I debug, I noticed that this issue happens with https://huggingface.co/google/ul2 & its not specific to our fine-tuned model. Since this open source model, hopefully you can debug much faster at your end. My results from vanilla run & the Nvidia script I followed:

https://github.com/NVIDIA/FasterTransformer/blob/main/docs/t5_guide.md#running-ul2-on-fastertransformer-pytorch-op

I cannot reproduce the issue by the public ckpt. So, please share the end to end scripts to reproduce your issue on the public ckpt.

anshoomehra commented 1 year ago

@byshiue

I do not have any custom code, I have simply run example script & followed steps 1, 2 :

Clone FastTransformer : https://github.com/NVIDIA/FasterTransformer.git

Step 1 : sudo apt-get install git-lfs git lfs install git lfs clone https://huggingface.co/google/ul2

Step 2 : python3 FasterTransformer/examples/pytorch/t5/utils/huggingface_t5_ckpt_convert.py \ -saved_dir ul2/c-models \ -in_file ul2/ \ -inference_tensor_para_size 1 \ -weight_data_type fp32 \ -p 1 \ --verbose

I tried uploading the cloned version from GIT, however, that is 93 MB, and the system does not let me upload more than 25 MB or split zip files format. Let me know if you need the git cloned files and if at any cloud folder, I can upload the same?

This makes me wonder if you are not able to reproduce the issue following the above steps; could there be dependencies that may cause the encoder files to not be generated ??

byshiue commented 1 year ago

@byshiue

I do not have any custom code, I have simply run example script & followed steps 1, 2 :

Clone FastTransformer : https://github.com/NVIDIA/FasterTransformer.git

Step 1 : sudo apt-get install git-lfs git lfs install git lfs clone https://huggingface.co/google/ul2

Step 2 : python3 FasterTransformer/examples/pytorch/t5/utils/huggingface_t5_ckpt_convert.py \ -saved_dir ul2/c-models \ -in_file ul2/ \ -inference_tensor_para_size 1 \ -weight_data_type fp32 \ -p 1 \ --verbose

I tried uploading the cloned version from GIT, however, that is 93 MB, and the system does not let me upload more than 25 MB or split zip files format. Let me know if you need the git cloned files and if at any cloud folder, I can upload the same?

This makes me wonder if you are not able to reproduce the issue following the above steps; could there be dependencies that may cause the encoder files to not be generated ??

I have said, I cannot reproduce your issue by the scripts. So, please share your end to end scripts, including how you launch docker, how you install the transformers, and so on.

anshoomehra commented 1 year ago

@byshiue

Step 1:Create a GCP-Vertex Image: Python 2, CUDA 11

Step 2: Clone FT https://github.com/NVIDIA/FasterTransformer.git

Step 3: Install dependencies recommended by FT pip install -r FasterTransformer/examples/pytorch/t5/requirement.txt

Step 4: Download UTL2 Model sudo apt-get install git-lfs git lfs install git lfs clone https://huggingface.co/google/ul2

Step 5: Convert the checkpoint to FT python3 ../examples/pytorch/t5/utils/huggingface_t5_ckpt_convert.py \ -saved_dir ul2/c-models \ -in_file ul2/ \ -inference_tensor_para_size 2 \ -weight_data_type fp32

Step 6: Above step fails with below error: symbol cublasLtHSHMatmulAlgoInit version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

Fix above error by doing pip uninstall nvidia_cublas_cu1

Step 7: Rerun Step 5, which completes producing decoding files and missing encoder files

anshoomehra commented 1 year ago

All Libs & Versions

(base) jupyter@ul2-transform:~/ul2-transform$ pip freeze absl-py==1.4.0 aiohttp==3.8.4 aiohttp-cors==0.7.0 aiorwlock==1.3.0 aiosignal==1.3.1 ansiwrap==0.8.4 antlr4-python3-runtime==4.8 anyio @ file:///home/conda/feedstock_root/build_artifacts/anyio_1666191106763/work/dist argon2-cffi @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi_1640817743617/work argon2-cffi-bindings @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi-bindings_1649500320262/work async-timeout==4.0.2 asynctest==0.13.0 attrs @ file:///home/conda/feedstock_root/build_artifacts/attrs_1671632566681/work Babel==2.12.1 backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work backoff==2.2.1 backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1618230623929/work beatrix-jupyterlab @ file:///home/kbuilder/miniconda3/conda-bld/dlenv-base_1681181343956/work/packages/beatrix_jupyterlab-2023.46.184821.tar.gz beautifulsoup4 @ file:///home/conda/feedstock_root/build_artifacts/beautifulsoup4_1680888073205/work bleach @ file:///home/conda/feedstock_root/build_artifacts/bleach_1674535352125/work blessed==1.20.0 brotlipy @ file:///home/conda/feedstock_root/build_artifacts/brotlipy_1648854164153/work cachetools==5.3.0 certifi==2022.12.7 cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1666183775483/work charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1678108872112/work click==8.1.3 cloud-tpu-client==0.10 cloudpickle==2.2.1 colorama==0.4.6 colorful==0.5.5 conda==22.9.0 conda-package-handling @ file:///home/conda/feedstock_root/build_artifacts/conda-package-handling_1669907009957/work conda_package_streaming @ file:///home/conda/feedstock_root/build_artifacts/conda-package-streaming_1669733752472/work cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography_1666563371538/work cycler==0.11.0 Cython==0.29.34 datasets==2.3.2 db-dtypes==1.1.1 debugpy==1.6.7 decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work defusedxml @ file:///home/conda/feedstock_root/build_artifacts/defusedxml_1615232257335/work Deprecated==1.2.13 dill==0.3.5.1 distlib==0.3.6 dm-tree==0.1.8 docker==6.0.1 entrypoints @ file:///home/conda/feedstock_root/build_artifacts/entrypoints_1643888246732/work fastapi==0.95.0 fastjsonschema @ file:///home/conda/feedstock_root/build_artifacts/python-fastjsonschema_1677336799617/work/dist filelock==3.11.0 flit_core @ file:///home/conda/feedstock_root/build_artifacts/flit-core_1667734568827/work/source/flit_core fonttools==4.38.0 frozenlist==1.3.3 fsspec==2023.1.0 gcsfs==2023.1.0 gitdb==4.0.10 GitPython==3.1.31 google-api-core==1.34.0 google-api-python-client==1.8.0 google-auth==2.17.2 google-auth-httplib2==0.1.0 google-auth-oauthlib==1.0.0 google-cloud-aiplatform==1.23.0 google-cloud-artifact-registry==1.8.1 google-cloud-bigquery==3.9.0 google-cloud-bigquery-storage==2.19.1 google-cloud-core==2.3.2 google-cloud-datastore==1.15.5 google-cloud-language==2.9.1 google-cloud-monitoring==2.14.2 google-cloud-resource-manager==1.9.1 google-cloud-storage==2.8.0 google-crc32c==1.5.0 google-resumable-media==2.4.1 googleapis-common-protos==1.59.0 gpustat==1.0.0 greenlet==2.0.2 grpc-google-iam-v1==0.12.6 grpcio==1.53.0 grpcio-status==1.48.2 Gymnasium==0.26.3 gymnasium-notices==0.0.1 h11==0.14.0 htmlmin==0.1.12 httplib2==0.22.0 huggingface-hub==0.14.1 idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work ImageHash==4.3.1 imageio==2.27.0 importlib-metadata==6.0.1 importlib-resources @ file:///home/conda/feedstock_root/build_artifacts/importlib_resources_1676919000169/work ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1666723258080/work ipython==7.34.0 ipython-genutils==0.2.0 ipython-sql==0.5.0 ipywidgets==8.0.6 jaraco.classes==3.2.3 jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1669134318875/work jeepney==0.8.0 Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/jinja2_1654302431367/work joblib==1.2.0 json5==0.9.11 jsonschema @ file:///home/conda/feedstock_root/build_artifacts/jsonschema-meta_1669810440410/work jupyter-http-over-ws==0.0.8 jupyter-server==1.23.6 jupyter-server-mathjax==0.2.6 jupyter-server-proxy==3.2.2 jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1673615989977/work jupyter_core==4.12.0 jupyterlab==3.4.8 jupyterlab-git==0.41.0 jupyterlab-pygments @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_pygments_1649936611996/work jupyterlab-widgets==3.0.7 jupyterlab_server==2.22.0 jupytext==1.14.5 keyring==23.13.1 keyrings.google-artifactregistry-auth==1.1.2 kiwisolver==1.4.4 kubernetes==26.1.0 llvmlite==0.39.1 lz4==4.3.2 markdown-it-py==2.2.0 MarkupSafe==2.1.2 matplotlib==3.5.3 matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1660814786464/work mdit-py-plugins==0.3.5 mdurl==0.1.2 mistune @ file:///home/conda/feedstock_root/build_artifacts/mistune_1675771498296/work more-itertools==9.1.0 msgpack==1.0.5 multidict==6.0.4 multimethod==1.9.1 multiprocess==0.70.13 nb-conda @ file:///home/conda/feedstock_root/build_artifacts/nb_conda_1654442778977/work nb-conda-kernels @ file:///home/conda/feedstock_root/build_artifacts/nb_conda_kernels_1636999991206/work nbclassic @ file:///home/conda/feedstock_root/build_artifacts/nbclassic_1680699279518/work nbclient==0.7.3 nbconvert @ file:///home/conda/feedstock_root/build_artifacts/nbconvert-meta_1681137024412/work nbdime==3.1.1 nbformat @ file:///home/conda/feedstock_root/build_artifacts/nbformat_1679336765223/work nest-asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1664684991461/work networkx==2.6.3 nltk==3.8.1 notebook @ file:///home/conda/feedstock_root/build_artifacts/notebook_1680870634737/work notebook-executor @ file:///home/kbuilder/miniconda3/conda-bld/dlenv-base_1681181343956/work/packages/notebook_executor notebook_shim @ file:///home/conda/feedstock_root/build_artifacts/notebook-shim_1667478401171/work numba==0.56.4 numpy==1.21.6 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-ml-py==11.495.46 oauth2client==4.1.3 oauthlib==3.2.2 omegaconf==2.1.2 opencensus==0.11.2 opencensus-context==0.1.3 opentelemetry-api==1.17.0 opentelemetry-exporter-otlp==1.17.0 opentelemetry-exporter-otlp-proto-grpc==1.17.0 opentelemetry-exporter-otlp-proto-http==1.17.0 opentelemetry-proto==1.17.0 opentelemetry-sdk==1.17.0 opentelemetry-semantic-conventions==0.38b0 packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1673482170163/work pandas==1.3.5 pandas-profiling==3.6.6 pandocfilters @ file:///home/conda/feedstock_root/build_artifacts/pandocfilters_1631603243851/work papermill==2.4.0 parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work patsy==0.5.3 pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1667297516076/work phik==0.12.3 pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work Pillow==9.5.0 pkgutil_resolve_name @ file:///home/conda/feedstock_root/build_artifacts/pkgutil-resolve-name_1633981968097/work platformdirs==3.2.0 plotly==5.14.1 pluggy==1.0.0 portalocker==2.7.0 prettytable==3.7.0 prometheus-client @ file:///home/conda/feedstock_root/build_artifacts/prometheus_client_1674535637125/work prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1677600924538/work proto-plus==1.22.2 protobuf==3.20.3 psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1666155398032/work ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl py-spy==0.3.14 pyarrow==11.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycosat @ file:///home/conda/feedstock_root/build_artifacts/pycosat_1666656960991/work pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work pydantic==1.10.7 Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1681142969746/work PyJWT==2.6.0 pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1680037383858/work pyparsing==3.0.9 pyrsistent==0.19.3 PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1648857264451/work python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work pytz==2023.3 PyWavelets==1.3.0 PyYAML==6.0 pyzmq==25.0.2 ray==2.3.1 ray-cpp==2.3.1 regex==2022.10.31 requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1680286922386/work requests-oauthlib==1.3.1 responses==0.18.0 retrying==1.3.4 rich==13.3.3 rouge-score==0.1.2 rsa==4.9 ruamel-yaml-conda @ file:///home/conda/feedstock_root/build_artifacts/ruamel_yaml_1653464404698/work sacrebleu==2.1.0 scikit-image==0.19.3 scikit-learn==1.0.2 scipy==1.7.3 seaborn==0.12.2 SecretStorage==3.3.3 Send2Trash @ file:///home/conda/feedstock_root/build_artifacts/send2trash_1628511208346/work sentencepiece==0.1.98 Shapely==1.8.5.post1 simpervisor==0.4 six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work smart-open==6.3.0 smmap==5.0.0 sniffio @ file:///home/conda/feedstock_root/build_artifacts/sniffio_1662051266223/work soupsieve==2.4 SQLAlchemy==2.0.9 sqlparse==0.4.3 starlette==0.26.1 statsmodels==0.13.5 tabulate==0.9.0 tangled-up-in-unicode==0.2.0 tenacity==8.2.2 tensorboardX==2.6 terminado @ file:///home/conda/feedstock_root/build_artifacts/terminado_1670253674810/work textwrap3==0.9.2 threadpoolctl==3.1.0 tifffile==2021.11.2 tinycss2 @ file:///home/conda/feedstock_root/build_artifacts/tinycss2_1666100256010/work tokenizers==0.12.1 toml==0.10.2 tomli==2.0.1 toolz @ file:///home/conda/feedstock_root/build_artifacts/toolz_1657485559105/work torch==1.13.1 tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1656937818679/work tqdm==4.64.1 traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1675110562325/work transformers==4.20.1 typeguard==2.13.3 typer==0.7.0 typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1678559861143/work uritemplate==3.0.1 urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1678635778344/work uvicorn==0.21.1 virtualenv==20.21.0 visions==0.7.5 wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1673864653149/work webencodings==0.5.1 websocket-client @ file:///home/conda/feedstock_root/build_artifacts/websocket-client_1675567828044/work widgetsnbextension==4.0.7 wrapt==1.15.0 xxhash==3.2.0 yarl==1.8.2 ydata-profiling==4.1.2 zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1677313463193/work zstandard @ file:///home/conda/feedstock_root/build_artifacts/zstandard_1655887611100/work

anshoomehra commented 1 year ago

If you want any specific log messages, please update the script FasterTransformer/examples/pytorch/t5/utils/huggingface_t5_ckpt_convert.py (and/or any other files) add whatever log messages you want to see. Send it across & I can run and share the logs.

byshiue commented 1 year ago

So, you don't use the docker image but you claim you follow the scripts of the guide.

Can you try following the document step by step, including the docker image.

anshoomehra commented 1 year ago

@byshiue I did not claim anything as you mentioned & using docker is not mandatory.

At this stage, I have provided everything to Nvidia to debug this issue & unfortunately, we are going nowhere!

We were excited about FastTransformers but it does not look like we can benefit from it at the moment - hope your team in the future makes this capability easier to consume & bug free boosting adoption.

Best

byshiue commented 1 year ago

@anshoomehra An interesting thing is, other two issues https://github.com/NVIDIA/FasterTransformer/issues/307 and https://github.com/NVIDIA/FasterTransformer/issues/554 have tested the Flan-UL2 without any issue on converter. Why you are so confident that the issue is not caused by docker image and environment?

Using docker is of course not mandatory. But we are finding the reason and why not try the docker image first, and it successes, we can have more clues about the reason.

NVIDIA / FasterTransformer