Thinking-with-Deep-Learning-Spring-2022 / Readings-Responses

You can post your reading responses in this repository.
0 stars 0 forks source link

From Diverse Data to Similar Signals & Sampling -Orientation #3

Open lkcao opened 2 years ago

lkcao commented 2 years ago

Post your questions here about: “The Datome - Finding, Wrangling and Encoding Everything as Data” OR “Together? Evaluating, Comparing and Combining Representations”--Thinking with Deep Learning, chapters 5 & 6.

JadeBenson commented 2 years ago

I agree that almost anything can be made into data, but I do wonder about what’s lost in the process. As you describe in the text analysis section, these approaches mainly take the words, their frequency, and proximity to derive patterns. However, this is limited compared to how humans perform literary analysis. For example, we look at the author, the context of their work, the larger references they are drawing from and how these influence the words they choose and their meanings. Although words might not occur closely together, they may be linked (i.e., foreshadowing, themes, metaphors). Cleaning/preprocessing may get rid of a crucial element of the text and its meaning – I’m imagining someone trying to interpret Zora Neale Hurston’s work but needing to fix the “misspellings” so that the algorithms can recognize them. I just believe there is much to literary analysis and anthropology (ethnography) that cannot be captured by only seeing it as data to be encoded. I’m curious about how we can mix these two approaches together. I think these quantitative methods are incredibly useful for processing large amounts of text that would be otherwise impossible for a human to interpret, but how do we do this in an informed way so that we can better include all these nuances?

I do think the example of the Nature article where researchers included networks of scientists into their predictions of materials is a good example of how we can take a more mixed methods approach to deep learning. However, this was just by providing another set of data that previously wasn’t included. Could all these complicated elements of textual analysis I described be manually coded in some way and then fed back in? (Although that does kind of defeat the efficiency of these approaches.) Are there important pieces of information and interpretation that cannot be data for deep learning algorithms? How do we acknowledge the strengths of these approaches, but also their limitations and augment them with our own knowledge?

thaophuongtran commented 2 years ago

Question on Chapter 5 - The Datome - Finding, Wrangling and Encoding Everything as Data: The chapter provided a helpful introduction to different channels of information and how they can be transformed into data. As I read through different types of data, I think that some data sources might have more noise than others while taking up more storage. For example, the transcript of a meeting might have less noise than an audio file since the transcript is not affected by background noise or sound quality. Yet if we want to determine the sentiment of that meeting (positive/negative, happy/angry), participants' pitch and tone derived from the audio files might be important features. Surely you can use both but that comes with the weight of computational/storage power. Would you concur? Why? Why not?

javad-e commented 2 years ago

Chapter 6 begins the discussion by asking how we evaluate "whether [a dataset] contains the signal required to build an effective deep model". It then introduces various methods for measuring distance. It wasn’t clear to me how we can use these distance functions to answer the original question. It would be great if we could cover this further in tomorrow’s lecture.

I also had a question about manifold distance metrics introduced in chapter 6. It appears to be very useful, but I did not fully understand what it means to measure distance by "walking along the skin of the data"?

ValAlvernUChic commented 2 years ago

Thank you for sharing these chapters with us! Chapter 6 had an explanation for the Wasserstein Distance but I was wondering when we might use it in place of KL-divergence or the Jensen-Shannon? The idea of "work" for the measure is super interesting but I wanted to clarify if "work" was just in terms of aligning the presence of words in one document with another or if it also captured more latent contextual and syntactic cues?

sabinahartnett commented 2 years ago

These two chapters definitely emphasized the flexibility of deep NNs and their application to diverse data types. A big concern of mine while reading this and doing my own implementations is the way that these algorithms can limit the framework within which we understand the significance of data (this is similar to some of my peers points above) but I worry that we might consider the 'most insightful' data to be data that is the most easily modeled and we might even limit our own (human) interpretations to what we can 'prove' at scale with these models. Is this a common criticism of implementing these models for traditionally manual / human analyses? How can we, as researchers, ensure that we continue to innovate and think beyond th ebounds of current models? Curious to hear what's in store for the lecture!

borlasekn commented 2 years ago

Chapter Five highlights the diverse types of data we may encounter. Obviously, there is not a "right" way to deal with data, and different models respond to different types of data. If we have a dataset that has many different types of data, is it best to build a pipeline of networks (i.e. break our data up and send photos through one network, then take the results of that and send that through another, etc.). OR, is it better (or possible) to send all of our data into the same network so it can learn from everything at the same time? I know you previously mentioned saving layers from some networks as well, so I was thinking that this might be a good solution?

zihe-yan commented 2 years ago

The idea of everything as data really excites me. But I also discovered that I might have a second thought about it. From the text, images, to time series, we are all thinking in a very structural way and try to compile different objects into digits and extract important features. It seems that we are finding a general pattern for society by using computational methods such as NNs. But looking from the other side, does that mean this method can only be used for finding general patterns of things in the world? What may be the limit in application in the field of social sciences? Or can we still improve the generality of this method?

egemenpamukcu commented 2 years ago

I am also curious about the more practical implications of learning from multiple data types for a single task. For instance, could we improve the performance of models processing audio or image data by providing metadata (table) corresponding to these media items? Are there certain frameworks, pre-trained models, or rules-of-thumb dedicated to integrating diverse data types in a deep learning model to improve model performance in a single task? Also, how does this relate to multimodal learning that seems to be on the rise recently? For instance, to my knowledge, most advanced language models are not trained on any data besides text (natural language, code, etc.), can these models theoretically be improved with additional modalities in the training datasets?

isaduan commented 2 years ago

Maybe we will be talking more about this in future chapters of the book, but I wonder how should we think about augmenting different sources of data (e.g. texts and images) to complement their limitation to a supposed question?

BaotongZh commented 2 years ago

Chapter 5 introduces the various representations of data, bringing us fresh acknowledgment of different data inputs. I was just wondering how we combine those data into an end-to-end model to enhance the performance of a model of a single data source. Especially, for example, we know than LSTM is good for text learning(seq-to-seq) and CNN is decent for image processing, then I combine my data by using the approaches mentioned in Chapter 6, then how the NN should be like if it is an end-to-end model.

pranathiiyer commented 2 years ago

While chapter 6 talks about alignment of representations of different data types, what would it mean intuitively to correlate for instance, image and text embeddings?

min-tae1 commented 2 years ago

While I believe deep learning can do marvelous things in analyzing music and art, some sort of historical context seems to be necessary for performing certain tasks. For instance, if one wants to classify artworks based on genres, a basic understanding of art history and individual artists would be necessary. The same would be the case for music, as features of each genre, the time of the release as well as artists' relation with other artists seem crucial in determining genres. Would there be a way for deep learning to know this without input? Or could there be a way to feed this information to deep learning models?

yujing-syj commented 2 years ago

For chapter 5, when we task a neural network with predicting word or sentence contexts, how large the data size should be to have a relatively good performance? When we resize an original image to a selected standard, we do not typically discard (or add) pixels, what is the detailed method to keep the information?

Hongkai040 commented 2 years ago

I am thinking about can we infuse different kinds of data into one unified representation? The work that words as images and color is very interesting. Can we incorporate other types of data into it? Like network data? We can map words into a relation network. So I think the combination of text, image, and network data could be interesting.

yhchou0904 commented 2 years ago

In Chapter5, we explored lots of options for us to deal with data and make raw data into different representations; in Chapter6, the book introduced tons of metrics for us to compare the representations. I have a specific question for one of the metrics, the fractional metric. The book mentioned that it is "not a metric" because it does not satisfy the triangle inequality. However, as a measure for us to get the distance of two representations, there must be some limitations to the use of this metric, what are they? At the same time, could you further explain how it works better in high dimensions?

Yaweili19 commented 2 years ago

This chapter is wonderful and is extremely practical in gathering all sources of data. However, what do look forward to seeing, is discussions on the downsides and missing knowledge of turning everything into data. Besides, although chapter 5 has introduced many forms of to-be-data information, I'd like to see what kind of knowledge/information we have not yet been successfully turning into data (without much loss), and what the frontiers on these topics are like,

Emily-fyeh commented 2 years ago

Chap 6 demonstrates methods of aligning representations, such as overlapping/averaging a series of images or intersecting multiple word embeddings. I would like to know if these approaches would lose some of the original information in time slices since only the comparable parts are kept.

sudhamshow commented 2 years ago

Studying about data representation using NNs begs several question - what information does the neural network pick up to encode data, how true is it a representation of the real data (relative to other points), can we force the network (or a single layer of nodes) to encode a particular attribute?

Studying about the various models these neural network representations seem unclear and backwards at the first though, at least for me. Each of the neural network that we've learnt about - simpler ones like W2V and more complex ones like CNNs were developed keeping some objective in mind - reduce the cost function by bringing closer words in the context window closer and pushing other words farther (W2V) or learning filter values to detect intricate patterns in images (CNN). Through forward propagation (and learning by error) it so happens that the end layers have higher activation concentrations and can more closely identify each distinct class/object, but these representations are a by-product of trying to optimise something else. Since we're not explicitly training the network to learn representation, with what confidence can we believe that all NN models encode some kind of information? And when studying social sciences aren't these factors (the explainability of encoding) more important?

What about representations of multi-modal data? Even though we might build a classifier with a high accuracy, what source/parameters do we attribute the representations to?

linhui1020 commented 2 years ago

I agree with the idea that everything we can observe in real world can be data, including audio, images and texts. What the model is doing is to train the representations of data that we process for the model, and usually we have to come out with the similar data structure to fit the specific model. Is there any way the model could differentiate the data forms more than aligning different data form into similar structure and then comes out some results.

chentian418 commented 2 years ago

I am curious about the aligning representation options. If units are common to both representations, these units can be the basis of an alignment; or in the case of common dimensions, if dense representations are produced using the same underlying dimensions, and then compare the location of units within the two spaces. however, I was wondering how realistic it is to have representations with common units or same underlying dimensions? What conversion should we do if data are from different sources and different models?

ShiyangLai commented 2 years ago

My question is about how we can represent different types if data (e.g. audio and text data) into the same numerical space and how can we further interpret the integrated data representation? Data points in the hybrid space could have multiple meanings. How can we do in-depth interpretation using hybrid data representation space?