We need to use the tokenizer manually since we need special tokens.
my_text = "Punta Cana is a resort town in the municipality of Higuey, in La Altagracia Province, the eastern most province of the Dominican Republic"
extracted_text = triplet_extractor.tokenizer.batch_decode([triplet_extractor(my_text,
return_tensors=True,
return_text=False)[0]["generated_token_ids"]])
print(extracted_text[0])
got error:
Traceback (most recent call last):
File "c:\2023\tsg_nlp\rebel\rebel_demo.py", line 10, in
extracted_text = triplet_extractor.tokenizer.batch_decode([triplet_extractor(my_text,
File "C:\Users\reuve\anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 3265, in batch_decode
return [
File "C:\Users\reuve\anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 3266, in
self.decode(
File "C:\Users\reuve\anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 3304, in decode
return self._decode(
File "C:\Users\reuve\anaconda3\lib\site-packages\transformers\tokenization_utils_fast.py", line 547, in _decode
text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
try to run this: from transformers import pipeline
triplet_extractor = pipeline('text2text-generation', model='Babelscape/rebel-large', tokenizer='Babelscape/rebel-large')
We need to use the tokenizer manually since we need special tokens.
my_text = "Punta Cana is a resort town in the municipality of Higuey, in La Altagracia Province, the eastern most province of the Dominican Republic" extracted_text = triplet_extractor.tokenizer.batch_decode([triplet_extractor(my_text, return_tensors=True, return_text=False)[0]["generated_token_ids"]])
print(extracted_text[0])
got error:
Traceback (most recent call last):
File "c:\2023\tsg_nlp\rebel\rebel_demo.py", line 10, in
extracted_text = triplet_extractor.tokenizer.batch_decode([triplet_extractor(my_text,
File "C:\Users\reuve\anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 3265, in batch_decode return [
File "C:\Users\reuve\anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 3266, in
self.decode(
File "C:\Users\reuve\anaconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 3304, in decode return self._decode(
File "C:\Users\reuve\anaconda3\lib\site-packages\transformers\tokenization_utils_fast.py", line 547, in _decode text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
TypeError: Can't convert {'output_ids': [[[0, 50267, 221, 20339, 2615, 102, 1437, 50266, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50265, 2034, 11, 5, 6833, 15752, 10014, 1437, 50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 19664, 1780, 219, 1437, 50266, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50265, 2034, 11, 5, 6833, 15752, 10014, 1437, 50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50266, 18978, 3497, 1437, 50265, 247, 1437, 50267, 18978, 3497, 1437, 50266, 1587, 7330, 1073, 13249, 493, 16517, 1437, 50265, 6308, 6833, 15752, 10014, 2]]]} to Sequence
what to do???