FacVain / dil-asistanim

1 stars 0 forks source link

Train an XMLRoBERTa model for Tone Analysis #25

Open FacVain opened 7 months ago

FacVain commented 7 months ago

Task Description

Train an XMLRoBERTa model for Tone Analysis with Turkish Tweets 5 Label Tone Analysis Dataset. XMLRoBERTa model should be able to correctly classify Turkish Tweets into one of the 5 classes given as : ["Kızgın", "Korku", "Mutlu", "Sürpriz", "Üzgün"]

Implementation Details

Train a transformers SequenceClassification model with pretrained Twitter XMLRoBERTa Base model. Use the same train and test datasets we used to fine-tune gpt-3.5-turbo model. Use tokenizer with max_length = 256 and max length padding. Use the Hugging Face Transformers for training.

Design and Tasks

Develop a Python script that tokenizes tweet-tone data received in CSV format via Kaggle to the input tensors for the XMLRoBERTa model. Use the same train and test datasets we used to fine-tune gpt-3.5-turbo model for a fair comparison. Develop a Python script that creates a 5-label SequenceClassification model with pretrained Twitter XMLRoBERTa Base model. Develop a training script using the Hugging Face Transformers library. Determine the best parameters for training.

Acceptance Criteria

The model should be able to provide successful results for a 5-class classification task.