EvilBT / ComfyUI_SLK_joy_caption_two

ComfyUI Node
GNU General Public License v3.0
117 stars 3 forks source link

Converting from Tiktoken failed #41

Open hnauto opened 6 days ago

hnauto commented 6 days ago

Issue Description

When attempting to convert a slow tokenizer to a fast tokenizer using the transformers library, an error occurred due to the inability to convert from Tiktoken. The error suggests that if a converter for SentencePiece is available, a model path with a SentencePiece tokenizer.model file should be provided.

Error Log


2024-10-21 19:13:07,333 - root - ERROR - !!! Exception during processing !!! Converting from Tiktoken failed, if a converter for SentencePiece is available, provide a model path with a SentencePiece tokenizer.model file.Currently available slow->fast convertors: ['AlbertTokenizer', 'BartTokenizer', 'BarthezTokenizer', 'BertTokenizer', 'BigBirdTokenizer', 'BlenderbotTokenizer', 'CamembertTokenizer', 'CLIPTokenizer', 'CodeGenTokenizer', 'ConvBertTokenizer', 'DebertaTokenizer', 'DebertaV2Tokenizer', 'DistilBertTokenizer', 'DPRReaderTokenizer', 'DPRQuestionEncoderTokenizer', 'DPRContextEncoderTokenizer', 'ElectraTokenizer', 'FNetTokenizer', 'FunnelTokenizer', 'GPT2Tokenizer', 'HerbertTokenizer', 'LayoutLMTokenizer', 'LayoutLMv2Tokenizer', 'LayoutLMv3Tokenizer', 'LayoutXLMTokenizer', 'LongformerTokenizer', 'LEDTokenizer', 'LxmertTokenizer', 'MarkupLMTokenizer', 'MBartTokenizer', 'MBart50Tokenizer', 'MPNetTokenizer', 'MobileBertTokenizer', 'MvpTokenizer', 'NllbTokenizer', 'OpenAIGPTTokenizer', 'PegasusTokenizer', 'Qwen2Tokenizer', 'RealmTokenizer', 'ReformerTokenizer', 'RemBertTokenizer', 'RetriBertTokenizer', 'RobertaTokenizer', 'RoFormerTokenizer', 'SeamlessM4TTokenizer', 'SqueezeBertTokenizer', 'T5Tokenizer', 'UdopTokenizer', 'WhisperTokenizer', 'XLMRobertaTokenizer', 'XLNetTokenizer', 'SplinterTokenizer', 'XGLMTokenizer', 'LlamaTokenizer', 'CodeLlamaTokenizer', 'GemmaTokenizer', 'Phi3Tokenizer']
2024-10-21 19:13:07,337 - root - ERROR - Traceback (most recent call last):
  File "D:\DEV\ComfyUI-aki-v1.4\python\lib\site-packages\transformers\convert_slow_tokenizer.py", line 1592, in convert_slow_tokenizer
    ).converted()
  File "D:\DEV\ComfyUI-aki-v1.4\python\lib\site-packages\transformers\convert_slow_tokenizer.py", line 1489, in converted
    tokenizer = self.tokenizer()
  File "D:\DEV\ComfyUI-aki-v1.4\python\lib\site-packages\transformers\convert_slow_tokenizer.py", line 1482, in tokenizer
    vocab_scores, merges = self.extract_vocab_merges_from_model(self.vocab_file)
  File "D:\DEV\ComfyUI-aki-v1.4\python\lib\site-packages\transformers\convert_slow_tokenizer.py", line 1458, in extract_vocab_merges_from_model
    bpe_ranks = load_tiktoken_bpe(tiktoken_url)
  File "D:\DEV\ComfyUI-aki-v1.4\python\lib\site-packages\tiktoken\load.py", line 144, in load_tiktoken_bpe
    contents = read_file_cached(tiktoken_bpe_file, expected_hash)
  File "D:\DEV\ComfyUI-aki-v1.4\python\lib\site-packages\tiktoken\load.py", line 48, in read_file_cached
    cache_key = hashlib.sha1(blobpath.encode()).hexdigest()
AttributeError: 'NoneType' object has no attribute 'encode'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\DEV\ComfyUI-aki-v1.4\execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "D:\DEV\ComfyUI-aki-v1.4\execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "D:\DEV\ComfyUI-aki-v1.4\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "D:\DEV\ComfyUI-aki-v1.4\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "D:\DEV\ComfyUI-aki-v1.4\custom_nodes\ComfyUI_SLK_joy_caption_two\joy_caption_two_node.py", line 351, in generate
    joy_two_pipeline.loadLLM()
  File "D:\DEV\ComfyUI-aki-v1.4\custom_nodes\ComfyUI_SLK_joy_caption_two\joy_caption_two_node.py", line 228, in loadLLM
    self.llm = JoyLLM()
  File "D:\DEV\ComfyUI-aki-v1.4\custom_nodes\ComfyUI_SLK_joy_caption_two\joy_caption_two_node.py", line 152, in __init__
    tokenizer = AutoTokenizer.from_pretrained(os.path.join(BASE_MODEL_PATH, "text_model"), use_fast=True)
  File "D:\DEV\ComfyUI-aki-v1.4\python\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 907, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "D:\DEV\ComfyUI-aki-v1.4\python\lib\site-packages\transformers\tokenization_utils_base.py", line 2208, in from_pretrained
    return cls._from_pretrained(
  File "D:\DEV\ComfyUI-aki-v1.4\python\lib\site-packages\transformers\tokenization_utils_base.py", line 2442, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "D:\DEV\ComfyUI-aki-v1.4\python\lib\site-packages\transformers\tokenization_utils_fast.py", line 138, in __init__
    fast_tokenizer = convert_slow_tokenizer(self, from_tiktoken=True)
  File "D:\DEV\ComfyUI-aki-v1.4\python\lib\site-packages\transformers\convert_slow_tokenizer.py", line 1594, in convert_slow_tokenizer
    raise ValueError(
ValueError: Converting from Tiktoken failed, if a converter for SentencePiece is available, provide a model path with a SentencePiece tokenizer.model file.Currently available slow->fast convertors: ['AlbertTokenizer', 'BartTokenizer', 'BarthezTokenizer', 'BertTokenizer', 'BigBirdTokenizer', 'BlenderbotTokenizer', 'CamembertTokenizer', 'CLIPTokenizer', 'CodeGenTokenizer', 'ConvBertTokenizer', 'DebertaTokenizer', 'DebertaV2Tokenizer', 'DistilBertTokenizer', 'DPRReaderTokenizer', 'DPRQuestionEncoderTokenizer', 'DPRContextEncoderTokenizer', 'ElectraTokenizer', 'FNetTokenizer', 'FunnelTokenizer', 'GPT2Tokenizer', 'HerbertTokenizer', 'LayoutLMTokenizer', 'LayoutLMv2Tokenizer', 'LayoutLMv3Tokenizer', 'LayoutXLMTokenizer', 'LongformerTokenizer', 'LEDTokenizer', 'LxmertTokenizer', 'MarkupLMTokenizer', 'MBartTokenizer', 'MBart50Tokenizer', 'MPNetTokenizer', 'MobileBertTokenizer', 'MvpTokenizer', 'NllbTokenizer', 'OpenAIGPTTokenizer', 'PegasusTokenizer', 'Qwen2Tokenizer', 'RealmTokenizer', 'ReformerTokenizer', 'RemBertTokenizer', 'RetriBertTokenizer', 'RobertaTokenizer', 'RoFormerTokenizer', 'SeamlessM4TTokenizer', 'SqueezeBertTokenizer', 'T5Tokenizer', 'UdopTokenizer', 'WhisperTokenizer', 'XLMRobertaTokenizer', 'XLNetTokenizer', 'SplinterTokenizer', 'X
EvilBT commented 6 days ago

Make sure that the Joy-Caption-alpha-two model files have been completely downloaded and are placed in the correct directory (models/Joy_caption_two). Also, double-check that the BASE_MODEL_PATH variable in joy_caption_two_node.py is pointing to the directory where the model files are located.

Suggestion 3:

Older versions of the Transformers library may not support tiktoken encoding or may have errors during the conversion process. Try upgrading the Transformers library to the latest version or use the specific version specified in requirements.txt.

You can also add the following to your reply:

If you have already checked these suggestions, please provide more information so I can assist you further. This information could include:

The version of ComfyUI you are using.

The version of Python you are using.

The specific download source for the model.