Unbabel / COMET

A Neural Framework for MT Evaluation
https://unbabel.github.io/COMET/html/index.html
Apache License 2.0
510 stars 79 forks source link

504 Server error when running comet-score using multiple machines #162

Closed Smu-Tan closed 1 month ago

Smu-Tan commented 1 year ago

🐛 Bug

Hi! A 504 server error is encountered when running multiple comet-score scripts. See below:

Traceback (most recent call last): File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py, line 261, in hf_raise_for_status response.raise_for_status() File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/requests/models.py, line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://huggingface.co/api/models/Unbabel/wmt22-comet-da/revision/main

The above exception was the direct cause of the following exception: Traceback (most recent call last): File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/comet/models/__init__.py, line 46, in download_model model_path = snapshot_download( File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py, line 118, in _inner_fn return fn(*args, **kwargs) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/_snapshot_download.py, line 186, in snapshot_download repo_info = api.repo_info(repo_id=repo_id, repo_type=repo_type, revision=revision, token=token) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py, line 118, in _inner_fn return fn(*args, **kwargs) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/hf_api.py, line 1868, in repo_info return method( File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py, line 118, in _inner_fn return fn(*args, **kwargs) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/hf_api.py, line 1678, in model_info hf_raise_for_status(r) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py, line 303, in hf_raise_for_status raise HfHubHTTPError(str(e), response=response) from e huggingface_hub.utils._errors.HfHubHTTPError: 504 Server Error: Gateway Time-out for url: https://huggingface.co/api/models/Unbabel/wmt22-comet-da/revision/main

During handling of the above exception, another exception occurred: Traceback (most recent call last): File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/comet/models/__init__.py, line 51, in download_model checkpoint_path = download_model_legacy(model, saving_directory) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/comet/models/download_utils.py, line 224, in download_model_legacy raise Exception( Exception: Unbabel/wmt22-comet-da is not in the available_legacy_metrics or is a valid checkpoint folder.

During handling of the above exception, another exception occurred: Traceback (most recent call last): File /home/stan1/anaconda3/envs/prefix_mt/bin/comet-score, line 8, in <module> sys.exit(score_command()) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/comet/cli/score.py, line 154, in score_command model_path = download_model(cfg.model, saving_directory=cfg.model_storage_path) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/comet/models/__init__.py, line 53, in download_model raise KeyError(fModel {model} not supported by COMET.) KeyError: Model Unbabel/wmt22-comet-da not supported by COMET.

To Reproduce

Here's the reproduction code template, pls ignore the task and seed setting.

!/bin/bash

RESULT_DIR=zero-shot

TASKS=(zs) SEEDS=(1234) SRCAR=('de' 'nl' 'sv' 'da' 'is') TGTAR=('de' 'nl' 'sv' 'da' 'is')

for (( t=0; t<${#TASKS[@]}; t++ )) do for (( s=0; s<${#SEEDS[@]}; s++ )) do first_id=$((t${#SEEDS[@]}+s)) for (( i=0; i<${#SRCAR[@]}; i++ )) do second_id=$((first_id${#SRCAR[@]}+i)) for (( j=0; j<${#TGTAR[@]}; j++ )) do third_id=$((second_id*${#TGTAR[@]}+j))

if [ "$third_id" -eq "$SLURM_ARRAY_TASK_ID" ] then

SRC=${SRCAR[i]} TGT=${TGTAR[j]}

if [[ "$SRC" != "$TGT" ]] then

echo "SRC-TGT: $SRC-$TGT"

SOURCE_SENT=${RESULT_DIR}/${SRC}-${TGT}/test-src.txt HYPOTHESIS=${RESULT_DIR}/${SRC}-${TGT}/test-sys.txt REFERENCE=${RESULT_DIR}/${SRC}-${TGT}/test-ref.txt comet-score -s ${SOURCE_SENT} -t ${HYPOTHESIS} -r ${REFERENCE} --quiet --only_system > ${RESULT_DIR}/${SRC}-${TGT}/test_comet.txt

fi fi

done done
done done

Environment

OS: Linux (slurm) comet version: newest

ricardorei commented 1 year ago

Hmm this seems to be a problem downloading the model and on HF side. Have you tried it recently?

ricardorei commented 1 year ago

it could be that HF Hub was down for a period

haroon830 commented 1 year ago

@Smu-Tan have you solved your problem?? I'm getting the same error of downloading the model.

weichuanW commented 11 months ago

@ricardorei Hi, I run the code

from comet import download_model, load_from_checkpoint
model_path = download_model("Unbabel/XCOMET-XL")

and get this exception:

Exception: Unbabel/XCOMET-XL is not in the available_legacy_metrics or is a valid checkpoint folder.

After checking this file, I found the available_legacy_metrics in comet/models/download_utils.py does not have the corresponding key-value pair. Can you update this file or tell me the way to directly download it on the HF?

the current version of unbabel-comet is 2.2.0 Best.

ricardorei commented 11 months ago

Hey! Hmm this is weird. available_legacy_metrics should just be called when the model is not found on Hugging face. What is your hugging face hub version? can you send me the pip freeze output?

weichuanW commented 11 months ago

OK, the following is the pip freeze list: accelerate==0.23.0 aeidon==1.12 aiofiles==23.2.1 aiohttp==3.8.6 aiosignal==1.3.1 altair==5.2.0 annotated-types==0.6.0 antlr4-python3-runtime==4.8 anyio==3.7.1 argh==0.30.2 async-timeout==4.0.3 atomicwrites==1.4.1 attrs==23.1.0 beautifulsoup4==4.12.2 bitarray==2.8.3 bitsandbytes==0.41.1 blessed==1.20.0 blis==0.7.11 catalogue==2.0.10 certifi==2022.12.7 cffi==1.16.0 chardet==5.2.0 charset-normalizer==2.0.12 cheroot==10.0.0 chinese-converter==1.1.1 click==8.1.7 cloudpathlib==0.16.0 cmake==3.25.0 colorama==0.4.6 coloredlogs==10.0 confection==0.1.3 contourpy==1.2.0 coverage==4.5.4 cycler==0.12.1 cymem==2.0.8 Cython==3.0.5 datasets==2.14.5 dill==0.3.7 distro==1.8.0 docstring-parser==0.15 docx2txt==0.8 einops==0.7.0 en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.0/en_core_web_lg-3.7.0-py3-none-any.whl#sha256=708da1110fbe1163d059de34a2cbedb1db65c26e1e624ca925897a2711cb7d77 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.0/en_core_web_sm-3.7.0-py3-none-any.whl#sha256=6215d71a3212690e9aec49408a27e3fe6ad7cd6c715476e93d70dc784041e93e enlighten==1.10.1 entmax==1.1 evaluate==0.4.1 exceptiongroup==1.1.3 fairseq==0.12.2 faiss==1.7.4 fastapi==0.104.1 fastbm25==0.0.2 fastBPE==0.1.1 fastest==0.3.1 fasttext==0.9.2 ffmpy==0.3.1 filelock==3.9.0 fluent.syntax==0.19.0 fonttools==4.44.0 frozenlist==1.4.0 fsspec==2023.6.0 gcld3==3.0.13 gradio==4.8.0 gradio_client==0.7.1 h11==0.14.0 httpcore==1.0.2 httpx==0.25.2 huggingface-hub==0.16.4 humanfriendly==10.0 hydra-core==1.0.7 icu==0.0.1 idna==3.4 importlib-resources==6.1.1 iniconfig==2.0.0 iniparse==0.5 jaraco.functools==3.9.0 Jinja2==3.1.2 joblib==1.3.2 jsonargparse==3.13.1 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 kiwisolver==1.4.5 langcodes==3.3.0 langdetect==1.0.9 latexcodec==2.0.1 Levenshtein==0.23.0 lightning-utilities==0.9.0 lingua-language-detector==1.3.3 lit==15.0.7 lxml==4.9.3 markdown-it-py==3.0.0 MarkupSafe==2.1.2 matplotlib==3.8.1 mdurl==0.1.2 mistletoe==1.2.1 more-itertools==10.1.0 mpmath==1.3.0 mtdata==0.4.0 multidict==6.0.4 multiprocess==0.70.15 murmurhash==1.0.10 networkx==3.0 numpy==1.24.4 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 omegaconf==2.0.6 optimum==1.13.2 orjson==3.9.10 packaging==23.2 pandas==2.1.1 pathtools==0.1.2 peft @ git+https://github.com/huggingface/peft@56556faa17263be8ef1802c172141705b71c28dc phply==1.2.6 Pillow==9.3.0 pluggy==0.13.1 ply==3.11 polyglot==16.7.4 portalocker==2.3.0 prefixed==0.7.0 preshed==3.0.9 protobuf==4.24.4 psutil==5.9.6 py==1.11.0 pyarrow==13.0.0 pybind11==2.11.1 pybtex==0.24.0 pycld2==0.42 pycparser==2.21 pydantic==2.4.2 pydantic_core==2.10.1 pydub==0.25.1 pyenchant==3.2.2 Pygments==2.17.2 PyICU==2.11 pyparsing==3.1.1 pytest==4.6.11 pytest-cov==2.10.1 python-dateutil==2.8.2 python-Levenshtein==0.23.0 python-multipart==0.0.6 pytorch-lightning==2.1.0 pytz==2023.3.post1 PyYAML==6.0.1 rank-bm25==0.2.2 rapidfuzz==3.4.0 referencing==0.32.0 regex==2023.10.3 requests==2.28.1 responses==0.18.0 rich==13.7.0 rpds-py==0.13.2 ruamel.yaml==0.17.32 ruamel.yaml.clib==0.2.8 sacrebleu==2.3.1 sacremoses==0.0.53 safetensors==0.4.0 scikit-build==0.17.6 scipy==1.11.3 seaborn==0.13.0 semantic-version==2.10.0 sentencepiece==0.1.99 shellingham==1.5.4 shtab==1.6.4 six==1.16.0 smart-open==6.4.0 sniffio==1.3.0 soupsieve==2.5 spacy==3.7.2 spacy-language-detection==0.2.1 spacy-legacy==3.0.12 spacy-loggers==1.0.5 srsly==2.4.8 starlette==0.27.0 sympy==1.12 tabulate==0.9.0 thinc==8.2.1 tokenizers==0.14.1 tomli==2.0.1 tomlkit==0.12.0 toolz==0.12.0 torch==2.0.1 torchaudio==2.0.2+cu117 torchmetrics==0.10.3 torchvision==0.15.2+cu117 tqdm==4.66.1 transformers==4.34.1 translate-toolkit==3.10.1 transliterate==1.10.2 triton==2.0.0 trl==0.7.4 typer==0.9.0 typing_extensions==4.8.0 tyro==0.5.17 tzdata==2023.3 unbabel-comet==2.2.0 urllib3==1.26.13 uvicorn==0.24.0.post1 vobject==0.9.6.1 wasabi==1.1.2 watchdog==0.9.0 wcwidth==0.2.8 weasel==0.3.3 websockets==11.0.3 wmtformat @ git+https://github.com/wmt-conference/wmt-format-tools.git@49983f17d8c99207c66a7f43fa49aa71d0692e48 xxhash==3.4.1 yarl==1.9.2 zhon==2.0.2

the hugging face hub version is huggingface-hub==0.16.4, I upgrade it to huggingface-hub-0.19.4 but still not work with the same error:)

weichuanW commented 11 months ago

OK, the following is the pip freeze list: accelerate==0.23.0 aeidon==1.12 aiofiles==23.2.1 aiohttp==3.8.6 aiosignal==1.3.1 altair==5.2.0 annotated-types==0.6.0 antlr4-python3-runtime==4.8 anyio==3.7.1 argh==0.30.2 async-timeout==4.0.3 atomicwrites==1.4.1 attrs==23.1.0 beautifulsoup4==4.12.2 bitarray==2.8.3 bitsandbytes==0.41.1 blessed==1.20.0 blis==0.7.11 catalogue==2.0.10 certifi==2022.12.7 cffi==1.16.0 chardet==5.2.0 charset-normalizer==2.0.12 cheroot==10.0.0 chinese-converter==1.1.1 click==8.1.7 cloudpathlib==0.16.0 cmake==3.25.0 colorama==0.4.6 coloredlogs==10.0 confection==0.1.3 contourpy==1.2.0 coverage==4.5.4 cycler==0.12.1 cymem==2.0.8 Cython==3.0.5 datasets==2.14.5 dill==0.3.7 distro==1.8.0 docstring-parser==0.15 docx2txt==0.8 einops==0.7.0 en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.0/en_core_web_lg-3.7.0-py3-none-any.whl#sha256=708da1110fbe1163d059de34a2cbedb1db65c26e1e624ca925897a2711cb7d77 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.0/en_core_web_sm-3.7.0-py3-none-any.whl#sha256=6215d71a3212690e9aec49408a27e3fe6ad7cd6c715476e93d70dc784041e93e enlighten==1.10.1 entmax==1.1 evaluate==0.4.1 exceptiongroup==1.1.3 fairseq==0.12.2 faiss==1.7.4 fastapi==0.104.1 fastbm25==0.0.2 fastBPE==0.1.1 fastest==0.3.1 fasttext==0.9.2 ffmpy==0.3.1 filelock==3.9.0 fluent.syntax==0.19.0 fonttools==4.44.0 frozenlist==1.4.0 fsspec==2023.6.0 gcld3==3.0.13 gradio==4.8.0 gradio_client==0.7.1 h11==0.14.0 httpcore==1.0.2 httpx==0.25.2 huggingface-hub==0.16.4 humanfriendly==10.0 hydra-core==1.0.7 icu==0.0.1 idna==3.4 importlib-resources==6.1.1 iniconfig==2.0.0 iniparse==0.5 jaraco.functools==3.9.0 Jinja2==3.1.2 joblib==1.3.2 jsonargparse==3.13.1 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 kiwisolver==1.4.5 langcodes==3.3.0 langdetect==1.0.9 latexcodec==2.0.1 Levenshtein==0.23.0 lightning-utilities==0.9.0 lingua-language-detector==1.3.3 lit==15.0.7 lxml==4.9.3 markdown-it-py==3.0.0 MarkupSafe==2.1.2 matplotlib==3.8.1 mdurl==0.1.2 mistletoe==1.2.1 more-itertools==10.1.0 mpmath==1.3.0 mtdata==0.4.0 multidict==6.0.4 multiprocess==0.70.15 murmurhash==1.0.10 networkx==3.0 numpy==1.24.4 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 omegaconf==2.0.6 optimum==1.13.2 orjson==3.9.10 packaging==23.2 pandas==2.1.1 pathtools==0.1.2 peft @ git+https://github.com/huggingface/peft@56556faa17263be8ef1802c172141705b71c28dc phply==1.2.6 Pillow==9.3.0 pluggy==0.13.1 ply==3.11 polyglot==16.7.4 portalocker==2.3.0 prefixed==0.7.0 preshed==3.0.9 protobuf==4.24.4 psutil==5.9.6 py==1.11.0 pyarrow==13.0.0 pybind11==2.11.1 pybtex==0.24.0 pycld2==0.42 pycparser==2.21 pydantic==2.4.2 pydantic_core==2.10.1 pydub==0.25.1 pyenchant==3.2.2 Pygments==2.17.2 PyICU==2.11 pyparsing==3.1.1 pytest==4.6.11 pytest-cov==2.10.1 python-dateutil==2.8.2 python-Levenshtein==0.23.0 python-multipart==0.0.6 pytorch-lightning==2.1.0 pytz==2023.3.post1 PyYAML==6.0.1 rank-bm25==0.2.2 rapidfuzz==3.4.0 referencing==0.32.0 regex==2023.10.3 requests==2.28.1 responses==0.18.0 rich==13.7.0 rpds-py==0.13.2 ruamel.yaml==0.17.32 ruamel.yaml.clib==0.2.8 sacrebleu==2.3.1 sacremoses==0.0.53 safetensors==0.4.0 scikit-build==0.17.6 scipy==1.11.3 seaborn==0.13.0 semantic-version==2.10.0 sentencepiece==0.1.99 shellingham==1.5.4 shtab==1.6.4 six==1.16.0 smart-open==6.4.0 sniffio==1.3.0 soupsieve==2.5 spacy==3.7.2 spacy-language-detection==0.2.1 spacy-legacy==3.0.12 spacy-loggers==1.0.5 srsly==2.4.8 starlette==0.27.0 sympy==1.12 tabulate==0.9.0 thinc==8.2.1 tokenizers==0.14.1 tomli==2.0.1 tomlkit==0.12.0 toolz==0.12.0 torch==2.0.1 torchaudio==2.0.2+cu117 torchmetrics==0.10.3 torchvision==0.15.2+cu117 tqdm==4.66.1 transformers==4.34.1 translate-toolkit==3.10.1 transliterate==1.10.2 triton==2.0.0 trl==0.7.4 typer==0.9.0 typing_extensions==4.8.0 tyro==0.5.17 tzdata==2023.3 unbabel-comet==2.2.0 urllib3==1.26.13 uvicorn==0.24.0.post1 vobject==0.9.6.1 wasabi==1.1.2 watchdog==0.9.0 wcwidth==0.2.8 weasel==0.3.3 websockets==11.0.3 wmtformat @ git+https://github.com/wmt-conference/wmt-format-tools.git@49983f17d8c99207c66a7f43fa49aa71d0692e48 xxhash==3.4.1 yarl==1.9.2 zhon==2.0.2

the hugging face hub version is huggingface-hub==0.16.4, I upgrade it to huggingface-hub-0.19.4 but still not work with the same error:)


The problem was solved by manually downloading the model from huggingface repo. Thx.

mohataher commented 8 months ago

You have to acknowledge the model's license on the web. Then perform a cli login on your code before downloading it.

ricardorei commented 8 months ago

I forgot this issue. Thanks for answering @mohataher.

laelhalawani commented 5 months ago

SOLVED - had the same issue Unbabel/wmt23-cometkiwi-da-xl' not supported by COMET it turned out to be issue with loging to huggingface. If you have it installed go to huggingface.co/settings/tokens to generate your token then huggingface-cli login and paste in the token Now if you run the code again it should successfully download the model