Open Kevin-Patyk opened 2 years ago
Hello @Kevin-Patyk Do you need to preprocess text inputs before tokenizer??? I test by adding text input into tokenizer, and summary, but output not good, maybe need preprocess input text before summarization
tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv", attention_type="original_full")
model = model.to(device)
inputs = tokenizer(text_ip, return_tensors='pt', truncation=True).to(device)
prediction = model.generate(**inputs) #max_length output is 256
prediction = tokenizer.batch_decode(prediction)
print(prediction)
Here is outputs:
['<s> the problem of machine learning is to find a way to learn from data.<n> this paper studies the problem of finding a way to learn a way to learn a way to learn a way to learn a way to learn.<n> we study the problem of finding a way to learn a way to learn a way to learn.<n> we study the problem of finding a way to learn a way to learn a way to learn.<n> we study the problem of finding a way to learn a way to learn a way to learn.<n> we study the problem of finding a way to learn a way to learn.<n> we study the problem of finding a way to learn a way to learn.<n> we study the problem of finding a way to learn a way to learn.<n> we study the problem of finding a way to learn a way to learn.<n> we study the problem of finding a way to learn a way to learn.<n> we study the problem of finding a way to learn a way to learn.<n> we study the problem of finding a way to learn a way to learn.<n> we study the problem of finding a way to learn a way to learn.<n> we study the problem of finding a way to']
Hello,
BigBird Pegaus, when creating summaries of text, is repeating the same sentence over and over. I have tried using text on the Hugging Face model hub and there is an issue posted on Stack Overflow (https://stackoverflow.com/questions/68911203/big-bird-pegasus-summarization-output-is-repeating-itself). Additionally, below are some images from the Hugging Face hub.
I am doing text summarization for my thesis and I am not sure why this is happening, but apparently it has been an issue for 6 months. Is there a way to prevent this from happening?
Thank you.