Adds utilities to format datasets to single sequence. Retains current functionality in main. Moves around code for modularity and expansion of new dataset formats.
Utility functions:
get_data_collator - to return data collator as needed
format_dataset - > to format dataset to single sequence using template
Related issue number
How to verify the PR
Unit tests passed
Was the PR tested
[ ] I have added >=1 unit test(s) for every new method I have added.
Description of the change
Adds utilities to format datasets to single sequence. Retains current functionality in main. Moves around code for modularity and expansion of new dataset formats.
Utility functions:
Related issue number
How to verify the PR
Unit tests passed
Was the PR tested