UChicago-Computational-Content-Analysis / Readings-Responses-2023

1 stars 0 forks source link

7. Accounting for Context - [E1] 1.  Palakodety, Shriphani, Ashiqur R. KhudaBukhsh, Jaime G. Carbonell. #20

Open JunsolKim opened 2 years ago

JunsolKim commented 2 years ago

Post questions here for this week's exemplary readings: 1.  Palakodety, Shriphani, Ashiqur R. KhudaBukhsh, Jaime G. Carbonell. “Mining Insights from Large-Scale Corpora Using Fine-Tuned Language Models”. Frontiers in Artificial Intelligence and Applications, Volume 325: ECAI 2020

pranathiiyer commented 2 years ago

Like many problems that deal with multi-lingual data, the authors mention that they encountered an out of vocabulary rate of almost 75% when compared against a pre-trained BERT model. How do we deal with out of vocabulary problems when dealing with different languages or different dialects of the same language?

isaduan commented 2 years ago

Should we be surprised that, even though BERT cannot account for negation, e.g. cannot separate 'not good' from 'good', removing valence shifters does not change the results?

GabeNicholson commented 2 years ago

In one of their robustness checks for the Religious Hatred section, they compare the number of hate words associated with religious tokens with the normal number of hate words in other random 4-gram sequences. But I think it would have been more accurate to sample from a part of the corpus which was also derogatory toward the other side. Something along the lines of tokens that explicitly mention the other party.

sudhamshow commented 2 years ago

I am curious about the legitimacy of using pre-trained models for mining niche insights (extremely specific relational questions like - 'Dante was born in _____'. Doesn't this lead to memorisation and result in overfitting on the training data?

Emily-fyeh commented 2 years ago

I like how this paper addresses the possible solution to negation and the intent to adjust BERT for the short, messy YouTube comments. For me I would like to know how these comments are related to the theme and the timeline of the original videos, also the context (neighboring comments).