Objective:
Clean up text data from Hugging Face (see link here) so that it can be used for deep learning model training.
Tasks:
[x] #10
[x] Get rid of excess information. We only want the text and scores for all the categories, so get rid of all the target, annotator information, etc. (Maybe keep comment id)
[x] Rescale all annotated scores from (0,5) to (0,1)
[x] Customized batching function so that tweets with similar lengths are in the same batch (see Tutorial 4 code; this is done by adding pad tokens to shorter tweets)
Objective: Clean up text data from Hugging Face (see link here) so that it can be used for deep learning model training.
Tasks: