Thinking-with-Deep-Learning-Spring-2022 / Readings-Responses

You can post your reading responses in this repository.
0 stars 0 forks source link

Sound & Image Learning -Orientation #7

Open lkcao opened 2 years ago

lkcao commented 2 years ago

Post your question here about the orienting readings: “Image Learning” & “Audio and Video Learning”, Thinking with Deep Learning, Chapters 13 & 14.

pranathiiyer commented 2 years ago

How can we make a model learn the relevance of two different types of data in the same medium? For instance, the association between the text written on an image? or specific audio in a video? I think we will learn about this in later weeks, but since this is relevant to my group's final project, I'd love to know what others think about this.

sabinahartnett commented 2 years ago

I am interested in the relationship between image and video learning - is it a form of transfer learning? Can we implement (with similar success) image models on freeze frames to simulate a temporal analysis and ultimately analyze that as video data? Or are the models entirely separate? This is a vague question but I'm new to audio/video learning and curious about the general landscape

Yaweili19 commented 2 years ago

Is it me or the orientation readings for this week aren't up? Thank you.

isaduan commented 2 years ago

Adding on to pranathi's question, it would be great if you walk us through different ways of designing neural network architecture so as to combine text embedding and image embedding to perform a classification task. Thank you!

BaotongZh commented 2 years ago

Post for earning the credit :-)

linhui1020 commented 2 years ago

Is there any possibility for the accuracy of illness prediction (or classification) higher than the professional doctors?

chentian418 commented 2 years ago

I have a question for vector representations and image embeddings: in the sample code we used the pre-trained resnet model and got a 512 dimensional image vector as result; practically, how should we choose the number of dimensions in the image embeddings, and how does this choice related to the dimensions of the image representations?

thaophuongtran commented 2 years ago

I'm interested in learning more about style transfer within domains - image2image - and across domains - image2sound. What would an example of image2sound look like?

yujing-syj commented 2 years ago

Is there any algorithm that are common used for classfication and object dectetion of the satellite image? For labeling and collecting the satellite image, is there any trick that we can apply?

Hongkai040 commented 2 years ago

I didn't find the link to the oriental reading on canvas, so I am going to post a question using my imagination:). I am wondering that as a synthesis of a bunch of audios and pictures, how and what can we do with videos using deep learning?

ShiyangLai commented 2 years ago

I don't have any specific questions about image and sound learning in my head mow. Look forward to hearing other peers' discussions in the lecture.

ValAlvernUChic commented 2 years ago

I was wondering beyond identifying particular agents in a picture whether it was possible to ascertain relationships amongst people in an image - i imagine it'd look something like a microexpression on one's face in reaction to someone else and then using that to make inferences about dynamics within a picture.