Closed zeynepyirmibes closed 1 year ago
We will delete all hyphens (-) at the end of a line, add a flag to that line and concatenate with the next line without space.
This may be helpful: https://huggingface.co/learn/nlp-course/chapter6/4 , recommended by @onurgu .
Cleaning and standardizing punctuation is implemented in normalize.py
We should standardize non-standard punctuation and remove end-of-sentence hyphens (bilgi veril- mektedir.).