a1da4 / paper-survey

Summary of machine learning papers
32 stars 0 forks source link

Reading: Diachronic degradation of language models: Insights from social media #123

Open a1da4 opened 4 years ago

a1da4 commented 4 years ago

0. Paper

@inproceedings{jaidka-etal-2018-diachronic, title = "Diachronic degradation of language models: Insights from social media", author = "Jaidka, Kokil and Chhaya, Niyati and Ungar, Lyle", booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)", month = jul, year = "2018", address = "Melbourne, Australia", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/P18-2032", doi = "10.18653/v1/P18-2032", pages = "195--200", abstract = "Natural languages change over time because they evolve to the needs of their users and the socio-technological environment. This study investigates the diachronic accuracy of pre-trained language models for downstream tasks in machine learning and user profiling. It asks the question: given that the social media platform and its users remain the same, how is language changing over time? How can these differences be used to track the changes in the affect around a particular topic? To our knowledge, this is the first study to show that it is possible to measure diachronic semantic drifts within social media and within the span of a few years.", }

1. What is it?

They evaluated the language change in a short span using SNS datasets.

2. What is amazing compared to previous works?

They experimented using not any long time span diachronic corpora but short time span SNS datasets.

3. Where is the key to technologies and techniques?

3.1 TASK1: Training (Age | Gender) predictor.

3.2 TASK2: Obtain context words via word embedding

Train word2vec skip-gram negative-sampling model. To obtain context words, they consider gender biases (Garg2017)

4. How did evaluate it?

4.1 TASK1: (Age | Gender) prediction

On the one hand, the lower-left shows that the over-estimation (predicted-age > actual-age) that means young person uses words in older topics such as swearing, tiredness and sleep.

On the other hand, the upper-right shows that the under-estimation (predicted-age < actual-age) that means early users in SNS grew older.

スクリーンショット 2020-09-17 2 20 25

4.2 TASK2: Obtain context words for (LGBTQ-issures, Positive emotion)

スクリーンショット 2020-09-17 2 26 50

5. Is there a discussion?

6. Which paper should read next?

a1da4 commented 4 years ago

126 Garg2017, which proposed the metrics 'relative norm distance'