Fine-tune XMLRoBERTa-Sentiment model for Sentiment Analysis

Task Description

Fine-tune an XMLRoBERTa model for Sentiment Analysis with the Turkish Sentiment Analysis Data Dataset we created previously. XMLRoBERTa model should be able to correctly classify Turkish Sentences into one of the 3 classes given as : ["Pozitif", "Nötr", "Negatif"]

Implementation Details

Train a transformers SequenceClassification model with pretrained Twitter XMLRoBERTa Sentiment model. Use the same train and test datasets we used to fine-tune gpt-3.5-turbo model. Use tokenizer with max_length = 256 and max length padding. Use the Hugging Face Transformers for training.

Design and Tasks

Develop a Python script that tokenizes sentiment dataset that is in CSV format via Kaggle to the input tensors for the XMLRoBERTa model. Use the same train and test datasets we used to fine-tune gpt-3.5-turbo model for a fair comparison. Develop a Python script that creates a 3-label SequenceClassification model with pretrained Twitter XMLRoBERTa Sentiment model. Develop a training script using the Hugging Face Transformers library. Determine the best parameters for training.

Acceptance Criteria

The Fine-tuned model should result in significant improvement in the 3-label classification task.

FacVain / dil-asistanim