Deep Learning? -Orientation

lkcao commented 2 years ago

Post your question here about the orienting readings: “Preface: How to Think with Deep Learning (Links to an external site.)”, “Deep Learning? (Links to an external site.)” & “Deep Learning as Machine Learning and More (Links to an external site.)”, Thinking with Deep Learning, Preface & Chapters 1 & 2.

pranathiiyer commented 2 years ago

I guess we'll study more about this through the course of this quarter, but this is something I was curious about as I read these chapters. The text says that adding a layer can often increase the performance of the model, but how do we know what function each hidden layer is performing or if the hidden layer is working like we thought it would? I understand that deep learning models can be black boxed, but how do we know how many hidden layers make sense given that it can be so computationally intensive? Is it just something you learn with experience?

ValAlvernUChic commented 2 years ago

In Chapter 1, the section on outputs, there is a distinction between multi-label classification with multiple sigmoids and multi-class classification with a softmax. How do these two differ in terms of probability outputs and when should we choose one over the other? Thanks all!

BaotongZh commented 2 years ago

Several questions came into my mind after reading those chapters.

First, there are so many options mentioned in chapter 1 for neural network architecture(i.e. number of hidden layers, number of neurons in each layer, activation functions, loss function, optimizers), the book mentioned we should experiment with those hyperparameters to make an optimal decision for training a model, but I was just wondering are there any general methods that we could implement for architecting our model to achieve good results, are there any prior experience when we are setting our models.

Second, I am a little bit confused when the book mentioned that we could train some of the layers in our model to do feature engineering. Does this mean we could just throw the raw data into the black box, then good results come out? If it is, could you explain more about it, or we will learn about it in the near future :)

sabinahartnett commented 2 years ago

I have a specific question about one of the images included in the text: (figure 1-5)

I was hoping you expand on the projections and geometric space in which they exist? From the reading I'm assuming that they are representative of the classification of each of the data points (you write in the text: 'neural network knowledge representations'). How do you map these representations/what steps does it take to get to these projections?

borlasekn commented 2 years ago

I just had a general question about the applications on Deep Learning, although I am sure we will get to this later in the class. In particular, in reading about how Neural Networks have grown to represent knowledge in Chapter One, I thought about an article from 2018 where Neural Networks had much higher rates of error for black woman than other demographics. Thus, the knowledge represented by Neural Networks, especially in its applications in scenarios about people (hiring, photo identification, etc.), could be erred. In our construction of Neural Networks, should we be doing checks at each stage to ensure our models are not based on discriminatory practices or data? If so, what checks would we complete? Is it impossible to eradicate discrimination in Neural Network models, because so much of the models, as stated in the preface, are hidden in these "black boxes"?

For reference, I looked and found the article: https://news.mit.edu/2018/study-finds-gender-skin-type-bias-artificial-intelligence-systems-0212

linhui1020 commented 2 years ago

unlike machine learnings with which we are more flexible in selecting limited tools, deep learning instead relies on many different combinations of loss function and activation function as well as how deeper the net should be, how to find a best suited model for a certain data sets with more explanation power?

yujing-syj commented 2 years ago

First, I have a question about nn.Linear(in_features, out_features). How can we choose the out_features for the model? Also, for the loss function, I use the different loss functions for the same model, but there is no difference for the accuracy score. I am curious why this happened? Beyond that, could you please explain more about each step of building the model since I am still confused about some details in choosing the parameters. Thanks!

javad-e commented 2 years ago

Chapter 1 mentions downsampling and pooling layers a couple of times. For example, "pooling layers that shrink or downsample the network, where units take in more connections than they give out". Could you please explain how this downsampling could be optimally achieved without figuring out the whole network first? Since the objective is to increase speed, I think forming the complete network prior to downsampling would not be an option here.

isaduan commented 2 years ago

I appreciate how these chapters give an intuitive understanding of what is deep learning: how it fits into other machine learning 'perspectives' and pipelines.

A broad question: Given a field that's moving so fast, what tips you would give to social science practitioners in consuming these ever-changing states of the arts?

A narrow question: Could you explain relative entropy or KL divergence loss?

JadeBenson commented 2 years ago

I think these chapters are exciting and give an overview of the seemingly endless possibilities of deep learning. I am curious about their limitations though and how we can thoughtfully apply these algorithms in our specific contexts. At the end of the preface, these limitations are explicitly stated that we need to consider both the data robustness and model objectives to determine "whether it makes sense to invest time and resources to build a deep learning model or not." I am curious how will we know if our data and objective are well-suited to deep learning methods? Do you think there are any questions that are inherently unable to be answered with these techniques or is it just a matter of collecting more/better data and feature engineering? I've primarily seen critiques of deep learning models because they are "black boxes," but as you explain this is a little simplistic and ungenerous. I believe that an important component of "thinking with" deep learning is also recognizing its limitations and potential pitfalls to either mitigate them or perhaps apply different strategies. What do you think are valid critiques of these methods?

hsinkengling commented 2 years ago

A technical question: given the complexity and computational costs of building a high-performance neural network, how could we "scale down" the costs of training a neural network for less complex, individualized input data? or is contemporary deep learning mostly useful in its capacity for training multi-million dollar, all-purpose text/image/audio models?

min-tae1 commented 2 years ago

The definition of loss functions in chapter 1 suggests that a known truth is required to evaluate the success of a deep learning model. Is there a case in which there is no known truth to evaluate? For instance, AlphaGo seems to suggest new strategies for playing the game of Go. In this case, could one argue that there is no known truth, since strategies of the past may be insufficient to evaluate AlphaGo's strategy?

sudhamshow commented 2 years ago

A couple of general questions regarding deep neural networks -

Multilayer feedforward neural networks are sometimes deemed as universal function approximators (Kur Hornik et al, 1989). How do deep neural networks achieve this complexity without overfitting?
If they can be estimated as universal approximations, wouldn't it void the 'no free lunch theorem'? A question on the No free lunch theorem as well - What is the level of task abstraction where the models begin to fail? Is it a completely different set of tasks (image vs text vs audio) or sub tasks within a genre (text - classification, generation)
The paragraph on loss functions (Evans and Desikan, Chapter 1) also state the use of MAE as a loss function? Does the optimiser use a gradient descent like technique? If yes, how is differentiation handled at the minima?

Hongkai040 commented 2 years ago

A specific question: In chapter 1, it says that 'differentiability is desirable, but may not be necessary for strong performance' So why do we need this property when we can have strong performance without it? I am not sure if I clearly understand this.

And I am kind of doubtful about the claim at the beginning of 'what makes a nework deep' in chapter 1: 'The number of layers determines the depth of the network, and neural networks seemingly perform better with more layers.' In the Computational Content Analysis class in last quarter, I did several trials of changing the number of layers of LSTM using keras for text classification. The performance was not always increasing with the increase of layers. It seemed that the performance 'saturated'. So could you please give more clarifications about the claim?

chentian418 commented 2 years ago

I have a question about Neural Network implementation. Since we will be focus on Keras and PyTorch rather TensorFlow in the class, what are some pros and cons using higher-level versus lower-level packages? And what categories of datasets do the data work best with? Thanks!

chuqingzhao commented 2 years ago

I would like to add one question about the geometric projection: projecting neural network knowledge representation on low-dimension space could increase the interpretability of data. Given that dimension reduction algorithms such as PCA, T-SNE would transform the vector coordiation from high-dimension to low dimension, I would like to know how to measure the information loss during the transformation process? And can we know how much we can trust the geometric representations on low dimensions?

Jinglan930 commented 2 years ago

What I really appreciate is that these two chapters provide a comprehensive learning framework for machine learning and deep learning. While the course may go further into related issues later, one of my current quandaries is which statistical model we apply under specific circumstances, for so many options are introduced and provided. In addition, I am a bit confused as to which step in machine learning we can (or should) apply deep learning. Although the authors suggest many possibilities at the end of chapter 2, I may need a concrete example to understand this application and replacement.

Yaweili19 commented 2 years ago

I really enjoyed these introductions as they provide a high-level framework for a quick overview, and listed and compared most mainstream programming implementations of deep learning. I can expect myself to refer to these chapters a lot in the coming sections.

About Deep Learning as Machine Learning and More, in chapter 1 Figure 1-3, it is said that the neural network contains 9 component models. What is the 9th model? (I assume 1-8 would be 4x2 hypotheses) Would it be the entire model/analysis pipeline/network? If not, what would it be?

Emily-fyeh commented 2 years ago

I would like to know do researchers generally compares the performances between the selected (simpler) ML models and complex deep learning models? Would there be a typical type of research topic that embrace the deep learning model more compared to other fields?

thaophuongtran commented 2 years ago

Questions from Chapter 2 - Deep Learning as Machine Learning and More: With different ML models, how can researchers evaluate the appropriate models to implement? In addition, can deep learning be a component within the ensemble methods? How important is it to understand all the mathematics and statistics underlying of these models?

Thinking-with-Deep-Learning-Spring-2022 / Readings-Responses

Deep Learning? -Orientation #1