Turkish-Word-Embeddings / Word-Embeddings-Repository-for-Turkish

A comprehensive word embedding repository for the Turkish language.
MIT License
12 stars 0 forks source link

Revision I #18

Closed CahidArda closed 5 months ago

CahidArda commented 7 months ago

The doc issue

Things to do:

Karahan:

Arda:

typos:

If we have time:

Errand:

Important: ESwA requires us to submit all the tables individually as well. Don't forget to update the relevant tables and add references if necessary (e.g. for the new Turkish Twitter Dataset)

Suggest a potential alternative/fix

No response

KarahanS commented 7 months ago

Decontextualized simply stands for "aggregated" - which is actually one of the decontextualization methods.

KarahanS commented 7 months ago

Slightly updated the settings for Sentiment Analysis.

For the sentiment analysis tasks, we configured the maximum number of epochs to 15. Additionally, we implemented an early stopping criterion: if the error on the validation set begins to increase, we halt the training phase.

To be consistent among the NLP tasks in terms of the number of epochs to train, I believe it is better to increase the number of epochs from 5 to 15 with an early stopping criterion. As a result of this update, we have to run X2Static BERT not only for the Twitter Dataset but for all of the three NLP tasks, setting the random seed = 7 with 15 epochs.

KarahanS commented 7 months ago

Updated the NLP code. Provide the path of the X2Static model:

    "x2_bert": {
        "model": os.path.join(FOLDER, ""),
        "dim": 768,
        "binary": False,
        "no_header": False,
    },

Then you can run the code with the following command for the third dataset (number of epochs = 15 with early stopping):

 python sentiment.py -d 3 -e 15 -w x2_bert 
CahidArda commented 7 months ago

added section for generalizing to other languages. Added examples of word embedding usage in the conclusion section

KarahanS commented 5 months ago

Successfully passed the Revision I, closing the issue.