Closed haven-jeon closed 3 years ago
6b6753fbfa0197fa418c3600e3f8aeb43d1675ca
>>> from kobart import get_kobart_tokenizer
>>> kobart_tokenizer = get_kobart_tokenizer()
using cached model
>>> kobart_tokenizer.tokenize("ab헣㉿cde")
['▁', 'ab', '<unk>', '<unk>', 'c', 'd', 'e']
아래와 같이 토큰을 누락시키는 버그가 존재함.