Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Wordpiece tokenizer tokenize_with_span(text) method should return the span begin and end relative to the text rather than the token. Since whitespace_tokenize is called on input text within the function to get the tokens, we should return span begin and end relative to the text directly to avoid another round of tokenization.
Wordpiece tokenizer
tokenize_with_span(text)
method should return the span begin and end relative to the text rather than the token. Sincewhitespace_tokenize
is called on inputtext
within the function to get the tokens, we should return span begin and end relative to the text directly to avoid another round of tokenization.