Closed andreaskoepf closed 1 year ago
Text datasets like fanfics contain long entries. This PR splits dataset entries that exceed the specified max_chunk_size into multiple smaller entries.
max_chunk_size
Text datasets like fanfics contain long entries. This PR splits dataset entries that exceed the specified
max_chunk_size
into multiple smaller entries.