Modalities / modalities

Modalities, a PyTorch-native framework for distributed and reproducible foundation model training.
MIT License
59 stars 5 forks source link

Tokenizer remove max length flag #152

Closed le1nux closed 3 months ago

le1nux commented 3 months ago

By default, we do not specify the tokenizer's max_length anymore and set truncation and padding to false now.

le1nux commented 3 months ago

I added more test cases concerning the relevant combination of max_length, padding, truncation for the single document case. The multi-document case it currently not supported and not needed so far, see def tokenize(self, text: str) -> List[int]: