Closed Matthieu-Tinycoaching closed 1 year ago
I tried the same thing, ctranslate works without any issues for this T5 models: https://huggingface.co/declare-lab/flan-sharegpt-xl. But it brakes for fastchat's model.
@guillaumekln any feedback regarding this model?
Best
The tokenizer from the fine-tuned model looks broken to me. See the tokenization difference with the base tokenizer and the different padding token:
>>> import transformers
>>> tokenizer = transformers.T5Tokenizer.from_pretrained("google/flan-t5-xl")
>>> tokenizer.convert_ids_to_tokens(tokenizer.encode("Quelle caractéristique possède Cérès qui rendrait la vie extraterrestre possible ?"))
['▁Quelle', '▁caractéris', 'tique', '▁possède', '▁C', 'é', 'r', 'ès', '▁qui', '▁rend', 'rait', '▁la', '▁vie', '▁extra', 'ter', 'rest', 're', '▁possible', '▁', '?', '</s>']
>>> tokenizer.pad_token
'<pad>'
>>> tokenizer = transformers.T5Tokenizer.from_pretrained("lmsys/fastchat-t5-3b-v1.0")
>>> tokenizer.convert_ids_to_tokens(tokenizer.encode("Quelle caractéristique possède Cérès qui rendrait la vie extraterrestre possible ?"))
['▁Quelle', ' ', '▁caractéris', 'tique', ' ', '▁possède', ' ', '▁C', 'é', 'r', 'ès', ' ', '▁qui', ' ', '▁rend', 'rait', ' ', '▁la', ' ', '▁vie', ' ', '▁extra', 'ter', 'rest', 're', ' ', '▁possible', ' ', '▁', '?', '</s>']
>>> tokenizer.pad_token
'[PAD]'
I get the expected output after making these changes:
[PAD]
to <pad>
in the file config.json
from the converted model directoryT5Tokenizer.from_pretrained("google/flan-t5-xl")
Hi @guillaumekln thanks for your feedback!
I would have 2 questions regarding generation options:
1) How are the sampling_topk
and sampling_temperature
affecting the generated output?
2) How to specify in the prompt to leave a blank answer if the answer cannot be found in the context?
sampling_topk
enables random sampling. You may want to read this document: https://huggingface.co/blog/how-to-generateThanks for the tips!
It works thanks. [PAD] -> <pad>
The tokenizer from the fine-tuned model looks broken to me. See the tokenization difference with the base tokenizer and the different padding token:
@guillaumekln This output from the fastchat-t3-3b tokenizer is expected. The fastchat tokenizer explicitly encodes whitespace as a workaround to Flan T5's inability to represent multiple whitespaces. The fastchat tokenizer also adds tokens for linebreak (\n) and other characters that are ignored by Flan T5's default tokenizer. See: https://github.com/lm-sys/FastChat/issues/1022#issuecomment-1540666091
So using the tokenizer Flan T5 tokenizer doesn't actually fully solve the problem since the fastchat model no longer recognizes multiple whitespaces, linebreaks, and other characters.
It would be great if the fastchat model could be fully supported by ctranslate2.
Changing [PAD]
to <pad>
in config.json seems enough to fix the converted model. It works with the fastchat-t3-3b tokenizer (no need to use the flan-t5-xl
tokenizer).
Don't forget to use the following parameters when decoding: text_output = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens), spaces_between_special_tokens=False, skip_special_tokens=True)
Hi @filipemesquita it seems however that fastchat-t3-3b
tokenizer isn't a fast tokenizer as flan-t5-xl
is. This could decrease inference performance.
Hi @filipemesquita it seems however that
fastchat-t3-3b
tokenizer isn't a fast tokenizer asflan-t5-xl
is. This could decrease inference performance.
tokenization happens before and after the inference and it is 1000x faster, so even if there will be a decrease, only by 0.1%
I agree that tokenization performance is not a significant portion of the overall inference. I think the main negative impact of using the tokenizer from fastchat-t5-3b
is that it generates tokens for whitespace, which decreases the total capacity for useful tokens in the context (input tokens).
But in my experiments, the quality of the output is affected by using the tokenizer from flan-t5-xl
. So if you are looking for similar quality compared to the model in https://chat.lmsys.org/, you probably want to use the tokenizer created specifically for fastchat-t5-3b
.
Hi,
I tried to convert and use the
lmsys/fastchat-t5-3b-v1.0
model, which is an open-source chatbot trained by fine-tuning Flan-t5-xl (3B parameters) on user-shared conversations collected from ShareGPT.I converted with
default
andint8
quantization without error message. But, when trying to use the converted models for generation:I got the following inconsistent generated output:
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
While with hugging face model I got:
["Cérès pourrait héberger un océan d'eau liquide.\n"]
Any advice on this error?