Open JunsolKim opened 3 months ago
We have done some work on reducing the dimensionality of neural networks before, so this part of chapter 5 caught my attention: "text data are high-dimensional in that the meaning underlying each word, as a string of characters, is the same or different largely not based on components within the word itself (e.g, the distribution or ordering of letters), but the underlying definition."
On a practical level, what is happening when we reduce the dimensionality of text data? Are we essentially losing entire alternative definitions of these words?
In the “The Datome - Finding, Wrangling and Encoding Everything as Data”, I want to know something about the application of Deep Learning Models in social network analysis. Typically, given the complexity and dynamic nature of social networks, what are the limitations of using graph neural networks for analyzing social structures, and how might these limitations affect the interpretation of social ties, influence, and community detection?
In Chapter 7, a data augmentation approach called Mixup was introduced. This technique is used to reduce overfitting by combining random pairs of training features with associated labels. I was wondering, can this method be applied to any type of structured prediction problem? Would it maybe work better on one type of learning problem over the other? In addition, is any aspect of this technique also helpful for training in an unlabeled setting?
I was still very confused about the "human-in-the-loop"concept after reading relevant session in Chapter 7. How is the annotated data different from every other regular data? What is the annotation that could help "compensate for biases and direct the ongoing human annotation process" in the process of model building? I need to know more about what annotators do to a dataset, at which stage of the model building pipeline, and what exactly do they annotate before getting a grasp on the subsequent human annotator sampling process.
Chapter 5, "The Datome - Finding, Wrangling and Encoding Everything as Data," talks about how deep learning sees and changes all types of data. It tells us how these models can take simple data, like words or pictures, and make it into a form that's easier to use. This helps the model make sense of big, complex sets of data.
The chapter also explores how changing the way we show data to the model can affect what it learns. It introduces an idea called transfer learning, where using what a model learned from one task can help it do better on a different task. Considering the main points of Chapter 5, How do deep learning models convert sparse data into a more compact form, and why is this important for understanding complex information? And how does the way we represent data (using simple raw data or more processed features) influence what a deep learning model can learn from that data?
The chapter emphasizes how neural networks and deep learning frameworks push us to view virtually all forms of information—text, images, audio, social networks, and more—as data. This perspective raises profound interdisciplinary questions about the implications for fields beyond computer science, such as ethics, sociology, and the humanities. How does the quantification and algorithmic processing of diverse forms of human expression and social interaction affect our understanding of human experience and societal structures? Are there aspects of human culture and social life that resist being fully captured or understood through the lens of data, and what might we lose in the attempt to encode everything as data?
The document, "The Datome - Finding, Wrangling and Encoding Everything as Data", highlights how neural networks encourage us to view everything as data, encompassing a wide array of domains from text and images to audio, graphs, and beyond. This approach necessitates abstracting complex entities into data representations, which inherently involves choosing what aspects to emphasize and what to omit. Consider the balance between the utility of simplification in making data computable and the risk of losing critical nuances of the original entities. How should data scientists navigate these trade-offs to ensure responsible and meaningful analysis?
"The Datome - Finding, Wrangling and Encoding Everything as Data," suggests different approaches for inputting sparse and dense data into models. When evaluating these approaches, how should we consider their impact in terms of computational efficiency and model performance?
Chapter 5: The Datome - Finding, Wrangling and Encoding Everything as Data.
This chapter mentions many uses of dimension-reduction techniques. I wonder why autoencoders are not being used? What's the difference between dimension-reduction and autoencoders, apart from linearity and non-linearity?
Chapter 7: When Big Data is Too Small - Sampling, Crowd-Sourcing and Bias - Will human labeling always be the bottleneck for supervised learning?
Chapter 5: The Datome - Finding, Wrangling and Encoding Everything as Data.
The chapter talks about a dictionary that is used to, for instance, map a word to its root lemma or map each word from text to its meaning. How is this dictionary created? I believe the quality of the dictionary is vital.
What I found most interesting after reading those chapters is that --- seems like (almost) everything can serve as data for deep learning with proper care... I really want to see some extra examples on how to transform an unusual thing into matrix and send them into various neural architectures. (e.g. what if I want to study human relationships/networks --- as long as we have many sample networks, they can be represented as matrice and tackled, right?)
I am interested in the discussion about S-Learners, T-Learners, TARNet, and Dragonnet in the fourth part of "Deep Learning for Causal Inference", titled "Three Different Approaches to Deep Causal Estimation". S-Learners estimate the treatment effect by building a single predictive model, whereas T-Learners do so by constructing two independent predictive models. One model predicts the outcome for individuals who received the treatment, and another predicts the outcome for those who did not. TARNet extends the T-learner by incorporating shared representation layers, and Dragonnet further adds a propensity score head to TARNet. As the complexity of these methods increases incrementally, I am curious whether more complex models would increase the risk of overfitting. Or, considering that in causal inference we aim to estimate a conditional average treatment effect (CATE) averaged out, does the issue of overfitting not significantly affect the conclusions drawn from the models?
What kinds of sampling methods are used, if any, if a model can be fitted with all existing knowledge from a particular source (e.g. wiki pages)?
How does the empirical performance of Dragonnet, in terms of bias and variance in estimating ATE, compare to traditional machine learning approaches to causal inference, and what implications does this have for the choice of algorithms in practical applications?
I have questions about the graph representation section and structural analysis in Chapter 5: The Datome - Finding, Wrangling, and Encoding Everything as Data. 1) What are the trade-offs between sparse and dense graph representations regarding memory usage, computational efficiency, and effectiveness in capturing complex network structures? 2) In what ways do higher-order network motifs, such as tetrads and cliques, contribute to our understanding of network dynamics and function? How can we efficiently identify and analyze these motifs in large-scale networks?
I have questions regarding sampling.
In the pursuit of robustness and generalization, to what extent should models be exposed to noisy and augmented data, and could there be a point where models become too generalized, losing their ability to make precise predictions in specific contexts?
Given the discussion in Chapter 7 about various sampling strategies to address imbalances in datasets, how can these sampling techniques specifically influence the ethical outcomes of deep learning models, particularly in scenarios involving sensitive or biased data?
After reading this chapter, my question is, if we reduce everything to data, how can we be sure to preserve the meaningful relationship in the data, assuming the conversion process is lossy? For example, if we convert audio data text, we may lose information about pitches or tones, which could be important for social science-related questions. I feel this is an area that wants require a significant amount of pre-existing knowledge.
What are the main advantages of neural networks in transforming sparse data into dense representations, and how does this capability enhance the ability to reconstruct, generalize, and learn intrinsic structures from data? Additionally, how do dimension-reduction techniques assist when computational constraints or data scarcity are issues, and what are some practical applications across various domains?
chapter 5: What are the comparative advantages and disadvantages of using raw low-level data versus processed high-level data in deep learning models? How do different data representation techniques impact the model's ability to learn and generalize, especially when dealing with complex, high-dimensional data?
chapter 7: Could you elaborate on the effectiveness of different resampling techniques (e.g., undersampling, oversampling, and negative sampling) in addressing class imbalance? What are the key factors to consider when choosing a resampling strategy to ensure robust model performance and minimize potential biases?
Deep Learning for Causal Inference: How do deep learning models handle the inherent complexity and potential confounding factors in causal inference tasks compared to traditional statistical methods? What advancements in model architecture or training techniques are necessary to improve the reliability and interpretability of causal inferences made by deep learning models?
The chapter highlights that neural networks and deep learning frameworks encourage viewing all information—text, images, audio, social networks—as data. How does this data-centric approach affect our understanding of human experience and societal structures, and what might we lose in trying to encode everything as data?
When applying deep learning to predict future events or trends, how is the model's adaptability and sensitivity to future uncertainty handled and evaluated?
Post your questions here about: “The Datome - Finding, Wrangling and Encoding Everything as Data”, “When Big Data is Too Small - Sampling, Crowd-Sourcing and Bias” & Thinking with Deep Learning, chapters 5,7; and “Deep learning for causal inference”, Bernard Koch, Tim Sainburg, Pablo Geraldo, Jiang Song, Yizhou Sun, and Jacob G. Foster.