RepositoryNotFoundError when optimizing private HF model

When trying to convert private sentence-transformer model on HF Hub with:

docker run -it --rm --gpus all \
    -v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.1 \
    bash -c "cd /project && \
    convert_model -m \"Matthieu/paraphrase-multilingual-MiniLM-L12-v2-ftGPL-MRv2\" \
    --backend onnx \
    --task embedding \
    --seq-len 16 128 128 \
    --auth-token hf_XXX"

The model-original.onnx file has been created but it seems to fail when optimizing onnx model with mixed-precision (fp16):


=============================
== Triton Inference Server ==
=============================

NVIDIA Release 22.07 (build 41737377)
Triton Server Version 2.24.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: CUDA Minor Version Compatibility mode ENABLED.
  Using driver version 510.39.01 which has support for CUDA 11.6.  This container
  was built with CUDA 11.7 and will be run in Minor Version Compatibility mode.
  CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
  with this container but was unavailable:
  [[System has unsupported display driver / cuda driver combination (CUDA_ERROR_SYSTEM_DRIVER_MISMATCH) cuInit()=803]]
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

Downloading tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 572/572 [00:00<00:00, 609kB/s]
Downloading tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16.3M/16.3M [00:02<00:00, 6.15MB/s]
Downloading special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 280/280 [00:00<00:00, 336kB/s]
Downloading config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 701/701 [00:00<00:00, 1.08MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.49k/1.49k [00:00<00:00, 1.95MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:00<00:00, 275kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.90k/3.90k [00:00<00:00, 5.25MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 701/701 [00:00<00:00, 1.00MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 122/122 [00:00<00:00, 165kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 471M/471M [01:19<00:00, 5.94MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 83.2kB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 280/280 [00:00<00:00, 281kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17.1M/17.1M [00:02<00:00, 6.11MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 572/572 [00:00<00:00, 596kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14.8M/14.8M [00:02<00:00, 6.18MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 229/229 [00:00<00:00, 238kB/s]
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 623, in _get_config_dict
    resolved_config_file = cached_path(
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 284, in cached_path
    output_path = get_from_cache(
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 502, in get_from_cache
    _raise_for_status(r)
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 417, in _raise_for_status
    raise RepositoryNotFoundError(
transformers.utils.hub.RepositoryNotFoundError: 401 Client Error: Repository not found for url: https://huggingface.co/Matthieu/paraphrase-multilingual-MiniLM-L12-v2-ftGPL-MRv2/resolve/main/config.json. If the repo is private, make sure you are authenticated.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/convert_model", line 8, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 413, in entrypoint
    main(commands=args)
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 349, in main
    num_attention_heads, hidden_size = get_model_size(path=commands.model)
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/backends/pytorch_utils.py", line 78, in get_model_size
    config = AutoConfig.from_pretrained(pretrained_model_name_or_path=path)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/configuration_auto.py", line 725, in from_pretrained
    config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 561, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 635, in _get_config_dict
    raise EnvironmentError(
OSError: Matthieu/paraphrase-multilingual-MiniLM-L12-v2-ftGPL-MRv2 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

It seems to be an authentification problem. Any advice?

ELS-RD / transformer-deploy

RepositoryNotFoundError when optimizing private HF model #148