Closed CahidArda closed 5 months ago
Decontextualized simply stands for "aggregated" - which is actually one of the decontextualization methods.
Slightly updated the settings for Sentiment Analysis.
For the sentiment analysis tasks, we configured the maximum number of epochs to 15. Additionally, we implemented an early stopping criterion: if the error on the validation set begins to increase, we halt the training phase.
To be consistent among the NLP tasks in terms of the number of epochs to train, I believe it is better to increase the number of epochs from 5 to 15 with an early stopping criterion. As a result of this update, we have to run X2Static BERT not only for the Twitter Dataset but for all of the three NLP tasks, setting the random seed = 7 with 15 epochs.
Updated the NLP code. Provide the path of the X2Static model:
"x2_bert": {
"model": os.path.join(FOLDER, ""),
"dim": 768,
"binary": False,
"no_header": False,
},
Then you can run the code with the following command for the third dataset (number of epochs = 15 with early stopping):
python sentiment.py -d 3 -e 15 -w x2_bert
added section for generalizing to other languages. Added examples of word embedding usage in the conclusion section
Successfully passed the Revision I, closing the issue.
The doc issue
Things to do:
Karahan:
Arda:
typos:
If we have time:
Errand:
turkish-text-tokenized.txt
or remove the second solution altogether.Important: ESwA requires us to submit all the tables individually as well. Don't forget to update the relevant tables and add references if necessary (e.g. for the new Turkish Twitter Dataset)
Suggest a potential alternative/fix
No response