Hello, I encountered an issue when trying to use the AIMv2 model (apple/aimv2-large-patch14-224) for multimodal tasks involving text tokenization. The error seems to suggest that the tokenizer is missing from the repository or improperly configured.
Code causing the issue:
from transformers import CLIPTokenizer
text_tokenizer = CLIPTokenizer.from_pretrained("apple/aimv2-large-patch14-224")
Error:
Traceback (most recent call last):
File "/media/syun/ssd02/python_learning/apple/qiita_project_AIMv2/aimv2-large-patch14-224/test.py", line 7, in <module>
text_tokenizer = CLIPTokenizer.from_pretrained("apple/aimv2-large-patch14-224")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/syun/ssd02/python_learning/apple/qiita_project_AIMv2/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2197, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for 'apple/aimv2-large-patch14-224'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'apple/aimv2-large-patch14-224' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.
Steps to reproduce:
Install the transformers library.
Use CLIPTokenizer.from_pretrained("apple/aimv2-large-patch14-224") to load the tokenizer.
Expected behavior:
The tokenizer should be loaded without errors, allowing text to be tokenized for multimodal tasks.
Observed behavior:
An error is raised indicating that the tokenizer cannot be found.
Possible explanation:
It seems the repository at apple/aimv2-large-patch14-224 on Hugging Face may not include the tokenizer files required for CLIPTokenizer.
Questions:
Does the AIMv2 model (apple/aimv2-large-patch14-224) include support for text tokenization?
If not, is there a recommended tokenizer or preprocessing pipeline for multimodal tasks using this model?
Hello, I encountered an issue when trying to use the AIMv2 model (
apple/aimv2-large-patch14-224
) for multimodal tasks involving text tokenization. The error seems to suggest that the tokenizer is missing from the repository or improperly configured.Code causing the issue:
Error:
Steps to reproduce:
transformers
library.CLIPTokenizer.from_pretrained("apple/aimv2-large-patch14-224")
to load the tokenizer.Expected behavior: The tokenizer should be loaded without errors, allowing text to be tokenized for multimodal tasks.
Observed behavior: An error is raised indicating that the tokenizer cannot be found.
Possible explanation: It seems the repository at
apple/aimv2-large-patch14-224
on Hugging Face may not include the tokenizer files required forCLIPTokenizer
.Questions:
apple/aimv2-large-patch14-224
) include support for text tokenization?Environment:
apple/aimv2-large-patch14-224
Thank you for your assistance!