Closed AmgadHasan closed 4 months ago
@AmgadHasan, hello. Currently, most FW conversion models have a tokenizer.json
file in the model path and do not have a tokenizer_config.json
file. So I believe your case is not very common. If your large v3 model is missing the tokenizer.json
file, I think adding this file to the path is simpler than changing the FW code to adapt to it (and also need to add the tokenizer_config.json
file).
For the conversion command, you can add the option --copy_files tokenizer.json
to include the tokenizer file in the model path.
Check for a
tokenizer_config.json
file in the model directory. If present, infer the whisper model type from the tokenizer config and download the correct tokenizer for this model type. This change is made to support inferring whisper-large-v3 from tokenizer config.Previously, if there was no
tokenizer.json
file but there was atokenizer_config.json
file, the loader ignores this file and automatically downloads the tokenizer fromwhisper-tiny.en
orwhisper-tiny
. This caused issues for models derived from whisper-large-v3 as it has a different tokenizer from these two models. For example, this caused the model to do translation even if the user specified the task as "transcribe" since these tokens have different token ids for large-v3.