Open jamesallenevans opened 2 years ago
In the first paper on “Reducing the Dimensionality of Data with Neural Networks", I did not understand how they were able to use these auto-encoders on text documents as they mention towards the end? I guess I also did not grasp the technicality of the paper thoroughly so would love to know what people think!
The last paper, "Can x2vec save lives? Integrating graph and language embeddings for automatic mental health classification", provides us a way to deal with complex sparse data, I was just wondering what are the differences between integrated model and ensembles of models since it seems that both methods are about making decision by several independent models(ensembles for sure is built by independent models, and the each of the MP2V and D2V "is in fact learning to represent non-redundant information."
In "The unreasonable effectiveness of deep learning in artificial intelligence", on the third page of the PDF, there is a paragraph filled with "why" questions. I spent a good portion of my undergraduate career in mathematics and statistics theory classes, writing proofs, and this seems to be a break from needing to know why before making conclusions. I know that their was a rise in critique of needing "proofs" behind social theory in the twentieth century, but it wasn't a complete break. That being said, how is Deep Learning with these unanswered "why" questions changing the dynamics of "truth" and "facts" in academic theory?
In “Reducing the Dimensionality of Data with Neural Networks", the authors say we can continuously pre-train RBM layers since it "always improves a lower bound on the log probability". What considerations should we have to decide the optimal number of these layers?
Farrell et al (2019) appears to be an incredibly important paper. I am still trying to digest the potential implications. Let us assume future research indeed verifies the reliability of deep learning models for inference. I understand the authors are careful to avoid hyperboles, but are we then talking about potentially replacing some of the traditional econometric models with deep learning?
My question is about "A Unified Approach to Interpreting Model Predictions". In this paper, the authors present a unified framework for interpreting predictions by assigning importance values to features. I am curious about what are the common methods (Mathematical and Algorithmic methods) to increase the interpretation of complex model predictions. With the repid optimization of algorithm, is there a common-used platform for us to get access to the most pioneer and improved deep learning algorithm?
On Can x2vec save lives?, "graph and language data generate relatively accurate results (69% and 76%, respectively) but when integrated, both data produce highly accurate predictions (90%)." How should we interpret these model performances, in the context of the substantive field of psychology? Does this mean that some signals revealed by one's networks are augmented by signals from how one speaks and vice versa? Can we interrogate the interaction of those signals further?
Farrell et al's paper compares the prediction and effectiveness of traditional methods and deep learning in different areas, i am quite interested why they further concern that high dimension data would be a challenge when using deep learning since high dimension data might be more appropriate to use deep learning from the perspective of we do not have to do some dimension reduction with a risk of losing important information (or not)?
Like @borlasekn, the paragraph of why questions in 'unreasonable effectiveness of deep learning in artificial intelligence' caught my attention, too. These sorts of questions are what really excite me about this field due to their mix of mathematics, computer science, and philosophy. I keep thinking about: 'How large is the set of all good solutions to a problem?' This seems at once infinite and singular - how do we quantify "good"? Aren't there just as many variables as the problem itself by which we could evaluate this? Can this be an objective identification or will it always include some ideological weighting of which features are more important?
I was really interested in the emphasis on connectionist models in Sejnowski, 2020. It seems that a lot of the innovations in the field of deep learning have been inspired by nature and cognitive modeling, additionally, the paper emphasized potential future tracks of the field as also nature inspired. Although it's intuitive that our incredibly high-potential processing brains might provide a good framework for computational processing, I am continuously shocked by the influence of cognitive modeling on NN model performance. Are there strong opinions in the field that counter this connectionist view? Is it possible that basing model architecture on another object may be limiting (in trying to mirror the brain, might we be missing potential avenues of exploration?)
The article can x2vec save life is particularly interesting to me since it combines two different embeddings from two types of data. This way of concatenation does make sense theoretically because it imitates human interaction more vividly, which includes both networks and languages. Meanwhile, the author also admits that it's possible that one embedding model is enough to generate accurate results (though, it takes higher computational complexity). I'm curious about this trade-off. When constructing a complex model like this, I wonder how far we should go in reconstructing the real-life interactions in the model instead of constructing a model with one critical aspect of human interaction (such as language embedding only)? What are the main criteria or standards that we can follow?
Sejinowski's work portrays how neuroscience and mathematics contributed to the development of deep learning. Are there no conflicts between the two disciplines? Does this mean that the brain can be reduced to its mathematical functions? If not, how might neuroscience and mathematics conflict, and what is its implication on the future or the current state of deep learning?
I am especially interested in the article "Deep Neural Networks for Estimation and Inference." Many economists are interested in applying machine learning techniques to their own research, but most of them worry about the lack of interpretability of the model, which leads the attention to methods like doubled machine learning. The article shows neural networks could be a well-performing model for us to identify treatment effects. I am just thinking about whether there is a possibility that we could estimate multiple effects at the same time with a single model just as we do with regressions? For this purpose, we must need a complex model and a massive computation amount. Does this trade-off really worth it?
If I read it correctly, the Lundberg and Lee (2017) reading suggests that we could construct simpler "explanation" models that turn deep learning models into linear regressions for better interpretation. While they offered measurements of accuracy and consistency, I'm curious what order of magnitude is the original model different from the typical explanation model? If it's sufficiently accurate and consistent, wouldn't a linear regression "explanation" model be just as useful as an accessible proxy model for wider usage?
The Lundberg and Lee (2017) paper especially stood out to me, as someone who has experience working with many feature variables and having trouble deciding the best way to interpret results. I think that offering a standard is a very interesting proposition, and that the three proposed properties that contribute to creating an interpretable explanation model are interesting and intuitive factors to consider about model construction and output. I mostly wonder when one should decide to use these methods to ease the interpretability of their model. How complex does a model have to be for the computation and knowledge required to create an explanation model become "worth it"?
In Can x2vec save lives, it is inspiring that the authors combine two embedding models to classify users communities on Reddit. As the author mentions the various kinds of embedding models, especially multitask embeddings, social science researchers could apply those multi-modal approaches to understand human behavior better. The paper illustrates the improvement of accuracy by combining two models, I wonder can we combine more than two task embedding models? How would the accuracy improve? Will there be a trade-off between the number of tasks and accuracy?
The article Can x2vec save lives? is very appealing to me because it tries to link language embedding and social network analysis. It seems to fit well with some of the mainstream social media platforms that are designed based on text posts and social connections, such as Facebook, Twitter, and Reddit. However, I am a bit concerned about applying this approach to a highly sensitive topic like suicide prediction. While technology companies can, and should, develop models to try to save lives, and 90% can already be considered a high predictive accuracy rate, when considering that this is a topic about human life, the issue may still be controversial, or at least worthy of further thought. I am not quite sure what attitude I should take towards this application of the technique. Also, I am wondering if the methodology developed in this article could also be used to predict other fringe behaviors, such as racism, hate speech, or others. Happy to hear what people think!
My comments are with reference to the first paper (Hinton et al. - Reducing the dimensionality of data using neural networks).
It is fascinating to see autoencoder-esque models pop up in a variety of applications. I wanted to draw parallels between the auto encoder currently being discussed in the article, used for dimensionality reduction and a word embedding model like word2vec. Is this comparison appropriate?
Both create dense representations of vectors (thereby reducing data dimension). Both provide bidirectional mapping between data and code spaces (skipgram and CBOW models for word2vec). The loss functions also seem eerily similar - word2vec tries to reduce the distance (dot product) between words in context window and increase the distance between the rest (negative sampling). The method discussed in the paper does similar optimisation using energy of similar pixel. With all these similarities could the representations got using the method described extend its application beyond data compression, perhaps something like word associations available with word2vec embedding?
The authors in the Nature paper 'Reducing the Dimensionality of Data with Neural Networks' said that they proposed a better way of reducing dimensionality that PCA in 2006. Within my best knowledge, it seems that we're still using PCA(and may be still as an important approach) for dimension reduction now, in 2022. So why is this case?
In Reducing the Dimensionality of Data with Neural Networks:
I really liked the conciseness of this paper, however, I'm having some difficulties understanding the steps. Especially the "pre-training" step. Is pretraining a new method or it is widely used in deep learning field? What is the purpose of this step and how does this support or lead to "unrolling" and "fine-tuning"?
In terms of the paper Reducing the Dimensionality of Data with Neural Networks, I was wondering when does nonlinear dimension reduction technologies with training a multilayer neural network make more sense in the social science background than linear dimension reduction techniques like PCA?
In the article "A Unified Approach to Interpreting Model Predictions", the authors purposed a perpetual explanation model to interpret the ML model result with SHAP values. To what extent can we rely on this aggregate measurement? The article also mentions that the order of the features in the explaining models matters. Is there any other approach that can help us decide which feature order would be optimal?
In "Reducing the Dimensionality of Data with Neural Networks", the authors used several specific autoencoders such as 784-1000-500-250-30 autoencoder or 625-2000-1000-500-30 autoencoder. What do these numbers indicate and what are these autoencoders referring to?
In “Reducing the Dimensionality of Data with Neural Networks" , the authors suggest that implementing interlayer learning algorithms can effectively pertain weights of deep autoencoders, and can be repeatedly as many times as desired. However, will reappearing too many times of layer-by-layer learning leading to overfitting of models? If not, why?
Pose a question about one of the following possibility readings: “Reducing the Dimensionality of Data with Neural Networks”, Hinton, G. E., Salakhutdinov, R. R., 2006. Science 313(5786):504-507; “Deep Neural Networks for Estimation and Inference”. 2021. M. H. Farrell, T. Liang, S. Misra. Econometrica 89(1); “The unreasonable effectiveness of deep learning in artificial intelligence”. 2020. T. J. Sejnowskia. PNAS 1907373117; A Unified Approach to Interpreting Model Predictions. 2017. S. Lundberg and Su-In Lee. NeurIPS; OR “Can x2vec save lives? Integrating graph and language embeddings for automatic mental health classification”. 2020. A. Ruch. J. Phys. Complex.