apple / ml-aim

This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Other
1.05k stars 50 forks source link

Missing Tokenizer for `apple/aimv2-large-patch14-224` #20

Closed syun88 closed 5 days ago

syun88 commented 5 days ago

Hello, I encountered an issue when trying to use the AIMv2 model (apple/aimv2-large-patch14-224) for multimodal tasks involving text tokenization. The error seems to suggest that the tokenizer is missing from the repository or improperly configured.

Code causing the issue:

from transformers import CLIPTokenizer

text_tokenizer = CLIPTokenizer.from_pretrained("apple/aimv2-large-patch14-224")

Error:

Traceback (most recent call last):
  File "/media/syun/ssd02/python_learning/apple/qiita_project_AIMv2/aimv2-large-patch14-224/test.py", line 7, in <module>
    text_tokenizer = CLIPTokenizer.from_pretrained("apple/aimv2-large-patch14-224")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/syun/ssd02/python_learning/apple/qiita_project_AIMv2/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2197, in from_pretrained
    raise EnvironmentError(
OSError: Can't load tokenizer for 'apple/aimv2-large-patch14-224'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'apple/aimv2-large-patch14-224' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.

Steps to reproduce:

  1. Install the transformers library.
  2. Use CLIPTokenizer.from_pretrained("apple/aimv2-large-patch14-224") to load the tokenizer.

Expected behavior: The tokenizer should be loaded without errors, allowing text to be tokenized for multimodal tasks.

Observed behavior: An error is raised indicating that the tokenizer cannot be found.

Possible explanation: It seems the repository at apple/aimv2-large-patch14-224 on Hugging Face may not include the tokenizer files required for CLIPTokenizer.

Questions:

  1. Does the AIMv2 model (apple/aimv2-large-patch14-224) include support for text tokenization?
  2. If not, is there a recommended tokenizer or preprocessing pipeline for multimodal tasks using this model?

Environment:

Thank you for your assistance!