Closed gokceuludogan closed 6 months ago
The tokenizer_function cannot differentiate between different model classes: BERT vs T5. The same procedure is applied to both. However, T5 requires [EOS] token while BERT lacks such token. https://github.com/boun-tabi-LMG/turkish-lm-tuner/blob/3e97efddbec2a834b1e13cdfc3f9dec4f15b820a/turkish_lm_tuner/dataset_processor.py#L111-L125
Addressed in commit https://github.com/boun-tabi-LMG/turkish-lm-tuner/pull/36/commits/c90605b5b96ad9c8b3284034c4cbc7e4430ca39f.
The tokenizer_function cannot differentiate between different model classes: BERT vs T5. The same procedure is applied to both. However, T5 requires [EOS] token while BERT lacks such token.
https://github.com/boun-tabi-LMG/turkish-lm-tuner/blob/3e97efddbec2a834b1e13cdfc3f9dec4f15b820a/turkish_lm_tuner/dataset_processor.py#L111-L125