Deep Classification, Embedding & Text Generation - Fundamentals

HyunkuKwon commented 3 years ago

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. Chapter 12.4 “Applications: Natural Language Processing.” MIT press: 456-473.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv:1810.04805. (Also check out: “Deep contextualized word representations” by Peters et al from AllenAI and the University of Washington).

Rogers, Anna, Olga Kovaleva & Anna Rumshisky. 2020. “A Primer in BERTology: What we know about how BERT works.” arXiv:2002:12327.

lilygrier commented 3 years ago

These papers helped me better understand what's going on under the hood in BERT (and why it takes such a long time to run on my laptop!). Rogers et al. (2020) de-mystified for me some of the black-box haziness surrounding BERT. I'm interested in hearing about applications of BERT in public policy. How could BERT be leveraged, for example, to flag students most in need of educational support?

jacyanthis commented 3 years ago

In the first week, you mentioned that SVD on a word embedding can outperform an LDA topic model. I think that may have been a reference to "discourse atoms." Could you provide some detail on this, or other ways in which neural networks can perform topic modeling?

Also, are discourse atoms available with genism or another accessible library?

Raychanan commented 3 years ago

The authors of the paper argue that the main limitation of existing techniques is that the standard language model is unidirectional, which makes the types of architectures that can be used in the pre-training of the model very limited. In the paper, the authors improve the approach to architecture-based tuning by proposing BERT, a bi-directional encoding representation of the Transformer.

I’ve heard about BERT a lot but have not seen many details about it before. Apparently, BERT is really doing great in many aspects. However, it does require so many computational resources, like other NLP models. So, instead of buying more computational equipment, are there any effective algorithms that can help us greatly optimize the computational process? Or are there any great ideas that could facilitate better implementation of BERT, even if they have not been formalized?

romanticmonkey commented 3 years ago

Can we possibly infuse pragmatic inferences into the self-attention heads? Although some pragmatic inferences are even hard for humans, the others might be easy enough to train from context corpus. I wonder if BERT models can adapt to simple pragmatic inferences eventually, given that we provide a huge corpus (e.g. Wikipedia + news databases + some social media texts, for English BERT).

k-partha commented 3 years ago

1) BERT and other transformer-based models have dominated NLP benchmarks for the last two years. In which areas of social content analysis do you see RNNs as still somewhat competitive/hold an edge, if at all?

2) What are some transformer models that can get around BERT's 512 token limit? This seems especially important as a lot of social content analysis involves large corpuses.

jcvotava commented 3 years ago

Where do we go from here? In other words, is BERT (or at least the architecture of deep learning it represents) the future of machine learning/computational sociolinguistics, with only minor refinements for the foreseeable future, or does it represent just one step into a whole class of possibly even better deep learning algorithms?

Rui-echo-Pan commented 3 years ago

Could you clarify more about the pre-training and fine-tuning? And also, could you introduce more on how such unsupervised learning can be used for the topic modelling?

MOTOKU666 commented 3 years ago

It's interesting to see an overview paper of BERT model. While as the review points out, BERT cannot reason the relationship between properties and affordance. It can, however, guess the relationship. I'm not sure whether the reasoning scale is really important in this area. Usually, stereotypes is used to infer reasons, is there any new way to deal with this problem? Or, are there any other directions we can focus on to supplement this issue?

hesongrun commented 3 years ago

In reading 1, Goodfellow et al., the authors still focused on RNN, while at present we see RNN is completely taken over by transformer based structures such as BERT. That's only over the past 4 years. The research cycle in AI and deep learning iterates really fast. What do you think is the key feature of transformer architecture that enable it to beat the traditional RNN? Thanks!

zshibing1 commented 3 years ago

Is it generally true BERT requires more computational power than other methods?

jinfei1125 commented 3 years ago

This week we learned a series of deep learning models such as BiLMs, ELMo and BERT models. They work well on some prediction tasks and may work less well on others. How should we choose them? Also, I am a little concerned about the lack of interpretation of the hidden layer in deep learning methods. I think the more complex the model is, the harder it is to interpret model. What do you think about this question?

ming-cui commented 3 years ago

BERT is slow to train. Are there any alternatives? Or BERT is always better given its excellent performance?

william-wei-zhu commented 3 years ago

Given BERT's computational cost, how widely accessible is this method beyond academic research institutions and tech companies?

xxicheng commented 3 years ago

Are there any other concerns of BERT besides its low speed? How to choose between various deep learning models?

mingtao-gao commented 3 years ago

Given deep learning's demand for high GPU performance, what are some services or applications that social scientist use to perform large-scale Natural Language Processing?

theoevans1 commented 3 years ago

It was interesting to read some efforts from Rogers et al. to understand the “why” of how BERT works. For social science research, what kinds of “why” questions are important to answer, and in what cases can prediction be useful on its own?

egemenpamukcu commented 3 years ago

As several other people mentioned, one of the biggest limitations seems to be performance. In general, do you think the limitations on the hardware side are crippling innovation on the software side?

Bin-ary-Li commented 3 years ago

Despite how good it claims to be, is BERT (or other giant transformer-based pre-trained network) actually useful in industry/for productivity purposes? Otherwise, those claims to fame about accuracy and benchmark really do not mean much.

UChicago-CCA-2021 / Readings-Responses

Deep Classification, Embedding & Text Generation - Fundamentals #37