gbouras13 / phold

Phage Annotation using Protein Structures
MIT License
43 stars 3 forks source link

Running Phold offline #44

Open aponsero opened 1 week ago

aponsero commented 1 week ago

Description

Hello, and thank you for your great tool! We are working with an HPC system in which the compute nodes are not connected to internet for security reasons. We can install and download the tools/databases in a specific node, but the tools need to be able to run offline after installation.

I installed and downloaded the databases following the instructions (using the phold install command), however, when running the tool, it fails because of the lack of connection (even if it seems like all the needed files are available). What would you recommend for our case?

What I Did

phold run -i NC_043029.gbk -o test_output_phold -d $DB --cpu

Error track:

2024-06-26 07:54:39.967 | INFO     | phold.utils.validation:instantiate_dirs:70 - Checking the output directory test_output_phold
2024-06-26 07:54:39.983 | INFO     | phold.utils.util:begin_phold:72 - phold: annotating phage genomes with protein structures
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:74 - You are using phold version 0.1.4
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:75 - Repository homepage is https://github.com/gbouras13/phold
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:76 - You are running phold run
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:77 - Listing parameters
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --input NC_043029.gbk
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --output test_output_phold
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --threads 1
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --force False
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --prefix phold
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --evalue 0.001
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --database /scratch/PHOLD_testing/Phold_database
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --batch_size 1
2024-06-26 07:54:39.984 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --sensitivity 9.5
2024-06-26 07:54:39.985 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --keep_tmp_files False
2024-06-26 07:54:39.985 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --cpu True
2024-06-26 07:54:39.985 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --omit_probs False
2024-06-26 07:54:39.985 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --finetune False
2024-06-26 07:54:39.985 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --finetune_path None
2024-06-26 07:54:39.985 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --split False
2024-06-26 07:54:39.985 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --split_threshold 60.0
2024-06-26 07:54:39.985 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --card_vfdb_evalue 1e-10
2024-06-26 07:54:39.985 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --separate False
2024-06-26 07:54:39.985 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --max_seqs 1000
2024-06-26 07:54:39.998 | INFO     | phold.utils.validation:check_dependencies:117 - Foldseek version found is v8.ef4e960
2024-06-26 07:54:39.998 | INFO     | phold.utils.validation:check_dependencies:126 - Foldseek version is ok
2024-06-26 07:54:39.998 | INFO     | phold.databases.db:validate_db:234 - Checking Phold database installation in /qib/scratch/users/zar24gir/PHOLD_testing/test_Phold
2024-06-26 07:54:40.017 | INFO     | phold.databases.db:validate_db:237 - All Phold databases files are present
2024-06-26 07:54:40.017 | INFO     | phold.io.handle_genbank:get_genbank:57 - Checking if input NC_043029.gbk is a Genbank file
2024-06-26 07:54:40.039 | INFO     | phold.utils.validation:validate_input:50 - Successfully parsed input NC_043029.gbk as a Genbank format file
2024-06-26 07:54:40.042 | INFO     | phold.features.predict_3Di:get_T5_model:121 - Using device: cpu
2024-06-26 07:54:40.043 | INFO     | phold.features.predict_3Di:get_T5_model:127 - Loading T5 from: /scratch/PHOLD_testing/Phold_database/Rostlab/ProstT5_fp16
2024-06-26 07:54:40.043 | INFO     | phold.features.predict_3Di:get_T5_model:128 - If /scratch/PHOLD_testing/Phold_database/Rostlab/ProstT5_fp16 is not found, it will be downloaded
Traceback (most recent call last):
  File "/hpc-home/micromamba/envs/pholdENV/lib/python3.11/site-packages/urllib3/connection.py", line 196, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/hpc-home/micromamba/envs/pholdENV/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/hpc-home/micromamba/envs/pholdENV/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
OSError: [Errno 101] Network is unreachable
EricDeveaud commented 3 days ago

Hello

same problem when restriction network applies.

installed phold from https://github.com/gbouras13/phold/archive/refs/tags/v0.1.4.tar.gz tagged archive. I ran phold install to grab the database.

when network is enabled, phold run -i ${DAT}/NC_043029_pharokka1.4.1.gbk -o temp -tnproc-f run as expected.

but when network is disabled (I usually run my tests suite with THEN without network)

build-nv [rpm]:phold/0.1.4 > sudo -E unshare -n phold run -i ../../datas/phold/NC_043029_pharokka1.4.1.gbk -o temp -t `nproc`  -f
2024-07-02 17:08:09.939 | INFO     | phold.utils.validation:instantiate_dirs:70 - Checking the output directory temp
2024-07-02 17:08:09.939 | INFO     | phold.utils.validation:instantiate_dirs:76 - --force was specified even though the output directory does not already exist. Continuing

.______    __    __    ______    __       _______
|   _  \  |  |  |  |  /  __  \  |  |     |       \
|  |_)  | |  |__|  | |  |  |  | |  |     |  .--.  |
|   ___/  |   __   | |  |  |  | |  |     |  |  |  |
|  |      |  |  |  | |  `--'  | |  `----.|  '--'  |
| _|      |__|  |__|  \______/  |_______||_______/

2024-07-02 17:08:09.947 | INFO     | phold.utils.util:begin_phold:72 - phold: annotating phage genomes with protein structures
2024-07-02 17:08:09.947 | INFO     | phold.utils.util:begin_phold:74 - You are using phold version 0.1.4
2024-07-02 17:08:09.948 | INFO     | phold.utils.util:begin_phold:75 - Repository homepage is https://github.com/gbouras13/phold
2024-07-02 17:08:09.948 | INFO     | phold.utils.util:begin_phold:76 - You are running phold run
2024-07-02 17:08:09.948 | INFO     | phold.utils.util:begin_phold:77 - Listing parameters
2024-07-02 17:08:09.948 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --input ../../datas/phold/NC_043029_pharokka1.4.1.gbk
2024-07-02 17:08:09.948 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --output temp
2024-07-02 17:08:09.948 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --threads 64
2024-07-02 17:08:09.949 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --force True
2024-07-02 17:08:09.949 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --prefix phold
2024-07-02 17:08:09.949 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --evalue 0.001
2024-07-02 17:08:09.949 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --database None
2024-07-02 17:08:09.949 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --batch_size 1
2024-07-02 17:08:09.949 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --sensitivity 9.5
2024-07-02 17:08:09.950 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --keep_tmp_files False
2024-07-02 17:08:09.950 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --cpu False
2024-07-02 17:08:09.950 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --omit_probs False
2024-07-02 17:08:09.950 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --finetune False
2024-07-02 17:08:09.950 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --finetune_path None
2024-07-02 17:08:09.950 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --split False
2024-07-02 17:08:09.950 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --split_threshold 60.0
2024-07-02 17:08:09.951 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --card_vfdb_evalue 1e-10
2024-07-02 17:08:09.951 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --separate False
2024-07-02 17:08:09.951 | INFO     | phold.utils.util:begin_phold:79 - Parameter: --max_seqs 1000
2024-07-02 17:08:09.972 | INFO     | phold.utils.validation:check_dependencies:117 - Foldseek version found is v8.ef4e960
2024-07-02 17:08:09.973 | INFO     | phold.utils.validation:check_dependencies:126 - Foldseek version is ok
2024-07-02 17:08:09.973 | INFO     | phold.databases.db:validate_db:234 - Checking Phold database installation in /opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database
2024-07-02 17:08:09.974 | INFO     | phold.databases.db:validate_db:237 - All Phold databases files are present
2024-07-02 17:08:09.974 | INFO     | phold.io.handle_genbank:get_genbank:57 - Checking if input ../../datas/phold/NC_043029_pharokka1.4.1.gbk is a Genbank file
2024-07-02 17:08:09.981 | INFO     | phold.utils.validation:validate_input:50 - Successfully parsed input ../../datas/phold/NC_043029_pharokka1.4.1.gbk as a Genbank format file
/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
2024-07-02 17:08:09.998 | WARNING  | phold.features.predict_3Di:get_T5_model:115 - No available GPU was found, but --cpu was not specified
2024-07-02 17:08:09.998 | WARNING  | phold.features.predict_3Di:get_T5_model:118 - ProstT5 will be run with CPU only
2024-07-02 17:08:09.999 | INFO     | phold.features.predict_3Di:get_T5_model:121 - Using device: cpu
2024-07-02 17:08:09.999 | INFO     | phold.features.predict_3Di:get_T5_model:127 - Loading T5 from: /opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database/Rostlab/ProstT5_fp16
2024-07-02 17:08:09.999 | INFO     | phold.features.predict_3Di:get_T5_model:128 - If /opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database/Rostlab/ProstT5_fp16 is not found, it will be downloaded
Traceback (most recent call last):
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/urllib3/connection.py", line 196, in _new_conn
    sock = connection.create_connection(
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/opt/gensoft/adm/Python/3.8/lib/python3.8/socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 789, in urlopen
    response = self._make_request(
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 490, in _make_request
    raise new_e
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 466, in _make_request
    self._validate_conn(conn)
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
    conn.connect()
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/urllib3/connection.py", line 615, in connect
    self.sock = sock = self._new_conn()
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/urllib3/connection.py", line 203, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7f4994567760>: Failed to resolve 'huggingface.co' ([Errno -2] Name or service not known)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/urllib3/util/retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Rostlab/ProstT5_fp16/resolve/main/model.safetensors (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f4994567760>: Failed to resolve 'huggingface.co' ([Errno -2] Name or service not known)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/gensoft/exe/phold/0.1.4/bin/phold", line 8, in <module>
    sys.exit(main())
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/__init__.py", line 1355, in main
    main_cli()
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/__init__.py", line 281, in run
    subcommand_predict(
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/subcommands/predict.py", line 125, in subcommand_predict
    prediction_success = get_embeddings(
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/features/predict_3Di.py", line 359, in get_embeddings
    model, vocab = get_T5_model(model_dir, model_name, cpu)
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/features/predict_3Di.py", line 129, in get_T5_model
    model = T5EncoderModel.from_pretrained(model_name, cache_dir=f"{model_dir}/").to(
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3494, in from_pretrained
    if not has_file(pretrained_model_name_or_path, safe_weights_name, **has_file_kwargs):
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/transformers/utils/hub.py", line 655, in has_file
    response = get_session().head(
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/requests/sessions.py", line 624, in head
    return self.request("HEAD", url, **kwargs)
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/huggingface_hub/utils/_http.py", line 66, in send
    return super().send(request, *args, **kwargs)
  File "/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/requests/adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError('HTTPSConnectionPool(host=\'huggingface.co\', port=443): Max retries exceeded with url: /Rostlab/ProstT5_fp16/resolve/main/model.safetensors (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f4994567760>: Failed to resolve \'huggingface.co\' ([Errno -2] Name or service not known)"))'), '(Request ID: fa01c838-513d-4f96-b9f7-5a54affa6401)')

while running ab inition phold install ProsT5_fp16 is to be downloaded

2024-07-02 17:11:30.385 | INFO     | phold:install:1097 - Downloading the Phold database into the default directory /opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database
2024-07-02 17:11:30.385 | INFO     | phold:install:1104 - Checking that the Rostlab/ProstT5_fp16 ProstT5 model is available in /opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database
2024-07-02 17:11:30.385 | INFO     | phold.features.predict_3Di:get_T5_model:121 - Using device: cpu
2024-07-02 17:11:30.385 | INFO     | phold.features.predict_3Di:get_T5_model:127 - Loading T5 from: /opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database/Rostlab/ProstT5_fp16
2024-07-02 17:11:30.385 | INFO     | phold.features.predict_3Di:get_T5_model:128 - If /opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database/Rostlab/ProstT5_fp16 is not found, it will be downloaded
config.json: 100%|██████████████████████████████████████████████████████| 733/733 [00:00<00:00, 888kB/s]
pytorch_model.bin: 100%|████████████████████████████████████████████| 5.64G/5.64G [00:11<00:00, 503MB/s]
tokenizer_config.json: 100%|████████████████████████████████████████| 2.40k/2.40k [00:00<00:00, 900kB/s]
spiece.model: 100%|███████████████████████████████████████████████████| 238k/238k [00:00<00:00, 209MB/s]
added_tokens.json: 100%|████████████████████████████████████████████████| 283/283 [00:00<00:00, 466kB/s]
special_tokens_map.json: 100%|█████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 3.79MB/s]

and it is, see:

build-nv [rpm]:phold/phold-0.1.4 > ls -R /opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database/models--Rostlab--ProstT5_fp16/
/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database/models--Rostlab--ProstT5_fp16/:
blobs  refs  snapshots

/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database/models--Rostlab--ProstT5_fp16/blobs:
2c19eb6e3b583f52d34b903b5978d3d30b6b7682
60fe6bb247c90b8545d7b73820cd796ce6dcbd59
6fc7be92c58e238f20a6cdea5a87b123a4ad35e2
74da7b4afcde53faa570114b530c726135bdfcdb813dec3abfb27f9d44db7324
b1a9ffcef73280cc57f090ad6446b4116b574b6c75d83ccc32778282f7f00855
e9322396e6e75ecf8da41a9527e24dfa4eeea505

/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database/models--Rostlab--ProstT5_fp16/refs:
main

/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database/models--Rostlab--ProstT5_fp16/snapshots:
07a6547d51de603f1be84fd9f2db4680ee535a86

/opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database/models--Rostlab--ProstT5_fp16/snapshots/07a6547d51de603f1be84fd9f2db4680ee535a86:
added_tokens.json  pytorch_model.bin        spiece.model
config.json        special_tokens_map.json  tokenizer_config.json

but name does not match the expected one by phold. as far as I understand phold tries to load from database/Rostlab/ProstT5_fp16 that do not exists

EricDeveaud commented 3 days ago

OK finaly got it. this is not quite ;-) a phold problem. it is related to transformers

setting 'TRANSFORMERS_OFFLINE=True' to the environment solved the problem. may I suggest that phold check if necessary files for Rostlab/ProstT5_fp16 are availble in database dir , before trying to deal with it

if necessary files are already downloaded and cached (phold install perform this task) then use local_files_only=True argument for T5EncoderModel.from_pretrained

something like this in src/phold/features/predict_3Di.py

    # load
    logger.info(f"Loading {model_name} T5 model from: {model_dir}")
    if os.path.isdir(os.path.join(model_dir, 'models--Rostlab--ProstT5_fp16')):
        localfile = True
        download = False
    else:
        logger.info(f"{model_name} is not found in {model_dir}, it will be downloaded")
        localfile = False
        download = True
    model = T5EncoderModel.from_pretrained(model_name, cache_dir=f"{model_dir}/", force_download=download, local_files_only=localfile).to( device)

with the above modification. loginfo display the following when model is present

2024-07-02 19:42:55.717 | INFO     | phold.features.predict_3Di:get_T5_model:129 - Loading Rostlab/ProstT5_fp16 T5 model from: /opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-07-02 19:42:58.628 | INFO     | phold.features.predict_3Di:get_T5_model:143 - Rostlab/ProstT5_fp16 loaded
2024-07-02 19:42:58.636 | INFO     | phold.features.predict_3Di:get_embeddings:367 - Beginning ProstT5 predictions

and when model is not present (I just rm it)

2024-07-02 19:45:36.281 | INFO     | phold.features.predict_3Di:get_T5_model:122 - Using device: cuda:0
2024-07-02 19:45:36.282 | INFO     | phold.features.predict_3Di:get_T5_model:133 - Rostlab/ProstT5_fp16 is not found in /opt/gensoft/exe/phold/0.1.4/venv/lib/python3.8/site-packages/phold/database/, it will be downloaded
config.json: 100%|██████████████████████████████████████████████████████| 733/733 [00:00<00:00, 360kB/s]
config.json: 100%|█████████████████████████████████████████████████████| 733/733 [00:00<00:00, 1.23MB/s]
pytorch_model.bin: 100%|████████████████████████████████████████████| 5.64G/5.64G [00:12<00:00, 469MB/s]
tokenizer_config.json: 100%|███████████████████████████████████████| 2.40k/2.40k [00:00<00:00, 2.72MB/s]
spiece.model: 100%|███████████████████████████████████████████████████| 238k/238k [00:00<00:00, 222MB/s]
added_tokens.json: 100%|████████████████████████████████████████████████| 283/283 [00:00<00:00, 352kB/s]
special_tokens_map.json: 100%|█████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 3.41MB/s]

regards

Eric