Images, Art & Video - LeCun, Bengio & Hinton 2015

jamesallenevans commented 4 years ago

LeCun, Yann, Yoshua Bengio & Geoffrey Hinton. 2015. “Deep Learning.” Nature 521: 436-444.

lkcao commented 4 years ago

I notice that CNN and RNN are often applied to different scenarios. Can we say that when analyzing images, CNN is the norm and when analyzing sequence, RNN is the norm? Is RNN the most commonly used technique in NLP?

arun-131293 commented 4 years ago

I notice that CNN and RNN are often applied to different scenarios. Can we say that when analyzing images, CNN is the norm and when analyzing sequence, RNN is the norm? Is RNN the most commonly used technique in NLP?

Yes, CNNs are the norm for dealing with images for a variety of reasons including the fact they are very good at dealing with high dimensional inputs like images, where each pixel can each be one to three features; this is in comparison to a text input where the number of input features are much lower(typically the length of a sentence or a document). CNNs can extract progressively more abstract representations of the image and figure out which combinations of those abstract features are useful in the task(like object detection in the case of classification) without using high number of parameters(weights). That is useful because your training speed is a function of parameters and very high parameter neural nets are very slow in training or not feasible to train at all.

RNNs are not the commonly used architecture in NLP anymore and were not ever particularly dominant, but modifications of RNNs like LSTMs/GRUs are standard and have been since around 2016. The reason is that RNNs weren't great at capturing long distance dependencies which are important for many sequence to sequence NLP tasks(their memory is short term). On the other hand, LSTMs/GRUs capture both long term and short term memories(dependencies) and are used in cutting edge architectures like ELMO which we saw last class.

katykoenig commented 4 years ago

The authors compare feature weights to "knobs" on the "input-output function of the machine" and go on to note that in most deep-learning applications, there are many (possible hundreds of millions) of these weights. Is it then possible/ever worthwhile to analyze the importance of all weights like one would compare regression coefficients? Obviously, it is possible to grab information regarding a certain feature's weight but are there systems to analyze all weights in a deep learning network?

tzkli commented 4 years ago

This is a great overview of the state of deep learning. The author predicts that "major progress in artificial intelligence will come about through systems that combine representation learning with complex reasoning" (p. 442). But it seems the approach we use today still relies largely if not entirely on statistical learning. What advances have computer scientists and linguists made in incorporating the symbolic and statistical approaches to NLP?

rkcatipon commented 4 years ago

Towards the end of the reading I was struck with the statement:

"Natural language understanding is another area in which deep learning is poised to make a large impact over the next few years. We expect systems that use RNNs to understand sentences or whole documents will become much better when they learn strategies for selectively attending to one part at a time."

What does it mean for a machine to understand? Does that imply an ability to interpret and to even hold meaning? Could it be, machines may be moving towards semiosis with spontaneous symbol creation and the ability to interpret those symbols?

luxin-tian commented 4 years ago

This paper reviews the development history of deep learning. As is mentioned by the author, a general approach for supervised learning is to tune the parameters to minimize the distance of the predictions to the actual observations. While this paper is a brief review of the technology without digging into the algorithm and the math, I wonder how the number of parameters in a neural network can be optimally chosen. That is, how many layers is the best for the classification task, and how many neutrons is the best for each hidden layer? Would the tuning of parameters vary with different tasks, and how, if any? And furthermore, is the choice of the number of parameters can be made automatically in the learning process or does it requires any pre-definition? (PS. I find a blog about model selection here. )

laurenjli commented 4 years ago

I'm interested in the practice of generating more training data from existing images when training computer vision models. The authors explain the process well, and it makes sense small randomized perturbations to an image would help boost the size of the training set without compromising the data. Are there also ways to do this for other types of data sets (i.e. text)?

di-Tong commented 4 years ago

The authors hold that unsupervised learning had a catalytic effect in reviving interest in deep learning, yet this review does not focus on this direction. Could you give us more examples on the tasks that unsupervised deep learning can contribute to in the social sciences settings?

chun-hu commented 4 years ago

I'm wondering if we can explain more about the hyperparameter tuning process of deep learning algorithms? Also, like others have mentioned, how do the weights matter when we have millions of parameters in our model?

sunying2018 commented 4 years ago

I am interested in the architecture of ConvNet. As mentioned in the article, '' the role of the convolutional layer is to detect local conjunctions of features from the previous layer", I am wondering what the advantage of this architecture compared with the weights used in simple neural network?

bjcliang-uchi commented 4 years ago

I am wondering how the algorithms know what the "right" translation is from image to text, since even different persons can give different descriptions on the same image, especially when seen from different aspects (Wittgenstein).

heathercchen commented 4 years ago

This paper discusses fundamental problems of deep learning. I am wondering what is the criterion for choosing the number of layers when using deep learning to do image analysis. Does each layer need to have plausible meanings or we decide on the number of layers on what induces the best outcome?

deblnia commented 4 years ago

I'm interested, if wary, of artificial intelligence in the wild as these authors propose (although what they specifically propose is "combining representation learning with complex reasoning"). These authors focus very exclusively on representational machine learning (which does, as output) resemble some vague approximation of "learning" because, as they note, data is processed through multiple layers of abstraction. Is this an exhaustive taxonomy of ML techniques? I'm thinking specifically of techniques like translational machine learning, which seem to be an application of representational learning.

sanittawan commented 4 years ago

I am aware that interpretability and causality may not be the priority of deep learning models as there has been no discussion of such issues in the reading, but I can't resist wondering if social scientists and/or computer scientists have made progress on these topics.

alakira commented 4 years ago

Usually, we divide the image to RGB color images and then use them as inputs. I wonder how this split affect the performance, and what else could we pre-process the image to improve our performance.

YanjieZhou commented 4 years ago

So can I conclude that CNN is much more useful in the scenario that images serve as the research objectives? I am wondering whether there are other areas where CNN can come in very handy.

wunicoleshuhui commented 4 years ago

I'm still quite confused about the details of this approach. Why do we combine RNNs with ConvNets in processing images?

ziwnchen commented 4 years ago

Based on previous comments, it seems to me that when talking about deep learning, what social science people care about is still "interpretability". For example, the interpretability of the chosen method, of deep model architecture design, and most importantly, of the results. Deep learning community does not concern too much about interpretability because they usually have fixed, objective criteria (e.g., BLEU score, imagenet evalution). However, it might be hard for the social science community to generate such a clear evaluation framework. From this perspective, is it valid to say that the way social science community apply deep model may be quite different from computer science? While the latter focused more on direct prediction/generation, the previous may use the deep model more as part of the experiment design?

kdaej commented 4 years ago

Whenever deep learning is discussed, many people fear the black box of the algorithm, complaining nobody knows what is happening. As a social scientist, it is also important to illuminate the underlying phenomenon not only to make predictions. Under this situation, what would be the role of social scientists who would like to incorporate deep learning techniques?

VivianQian19 commented 4 years ago

LeCun et al. provide a great overview of the major machine learning architectures and discuss how convolutional neural networks have been used in various contexts. The combination of ConvNets with Recurrent neural networks for image captioning is interesting and seems promising for the industry; however, I’m curious about how deep learning and ConVnets have been used in social science research?

cytwill commented 4 years ago

I think @katykoenig 's question is interesting to me. As the author mentioned the function of the weights in the machine learning models, we do need to have more insights into it. Especially, for different tasks(classification, feature extraction or signal transformation), are these weights change with the same criteria or different?

Computational-Content-Analysis-2020 / Readings-Responses

Images, Art & Video - LeCun, Bengio & Hinton 2015 #48