Backend: Data Pre-processing - Githubissues

AngelinaZhai / epai-sentiment-of-tweets

1 stars 0 forks source link

Backend: Data Pre-processing #4

Closed AngelinaZhai closed 1 year ago

AngelinaZhai commented 1 year ago

Objective: Clean up text data from Hugging Face (see link here) so that it can be used for deep learning model training.

Tasks:

[x] #10
[x] Get rid of excess information. We only want the text and scores for all the categories, so get rid of all the target, annotator information, etc. (Maybe keep comment id)
[x] Rescale all annotated scores from (0,5) to (0,1)
[x] Customized batching function so that tweets with similar lengths are in the same batch (see Tutorial 4 code; this is done by adding pad tokens to shorter tweets)