Closed le1nux closed 3 months ago
I added more test cases concerning the relevant combination of max_length
, padding
, truncation
for the single document case.
The multi-document case it currently not supported and not needed so far, see def tokenize(self, text: str) -> List[int]:
By default, we do not specify the tokenizer's
max_length
anymore and settruncation
andpadding
to false now.