EthioNLP / afri-sft-data

This repository generates instruction tuning dataset from different datasets.
5 stars 1 forks source link

Text cleaning #3

Open IsraelAbebe opened 8 months ago

IsraelAbebe commented 8 months ago

the current instruction dataset doesnt have cleaned text.

ymitiku commented 8 months ago

the current instruction dataset doesnt have cleaned text.

  • [ ] should we include that

Could you elaborate on what you mean by text cleaning? Is this preprocessing text or something else? An example would be great.

IsraelAbebe commented 8 months ago

yes , saw some emojis in some generations and was wondering what the cause was.