Is there any script or function for preprocessing of the text data?
Is it okay to use a train file, that looks like:
"abc"\n "def"\n "ghi"\n
Or should it be something like
{"text":"abc"}\n {"text":"def"}\n {"text":"ghi"}\n
So, can it be a raw text with "\n", or should I convert it into jsonl with only one field "text"? I've seen that "We support three file formats.." but cant find the examples or preprocessors.
Is there any script or function for preprocessing of the text data? Is it okay to use a train file, that looks like:
"abc"\n "def"\n "ghi"\n
Or should it be something like{"text":"abc"}\n {"text":"def"}\n {"text":"ghi"}\n
So, can it be a raw text with "\n", or should I convert it into jsonl with only one field "text"? I've seen that "We support three file formats.." but cant find the examples or preprocessors.
Thanks for help!