fastnlp / CPT

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation
481 stars 70 forks source link

Why will there be blank between chinese characters? How to fix it? #75

Open SupetZYK opened 11 months ago

SupetZYK commented 11 months ago
from transformers import BertTokenizer, BartForConditionalGeneration, Text2TextGenerationPipeline
tokenizer = BertTokenizer.from_pretrained("fnlp/bart-large-chinese")
model = BartForConditionalGeneration.from_pretrained("fnlp/bart-large-chinese")
text2text_generator = Text2TextGenerationPipeline(model, tokenizer)  
text2text_generator("北京是[MASK]的首都", max_length=50, do_sample=False)
    [{'generated_text': '北 京 是 中 华 人 民 共 和 国 的 首 都'}]

How to remove the blank between chinese characters?

shivanraptor commented 1 month ago

Just replace the space in the outputs with nothing by .replace(' ', ''). Easy.