Open lkcao opened 2 years ago
How can we make a model learn the relevance of two different types of data in the same medium? For instance, the association between the text written on an image? or specific audio in a video? I think we will learn about this in later weeks, but since this is relevant to my group's final project, I'd love to know what others think about this.
I am interested in the relationship between image and video learning - is it a form of transfer learning? Can we implement (with similar success) image models on freeze frames to simulate a temporal analysis and ultimately analyze that as video data? Or are the models entirely separate? This is a vague question but I'm new to audio/video learning and curious about the general landscape
Is it me or the orientation readings for this week aren't up? Thank you.
Adding on to pranathi's question, it would be great if you walk us through different ways of designing neural network architecture so as to combine text embedding and image embedding to perform a classification task. Thank you!
Post for earning the credit :-)
Is there any possibility for the accuracy of illness prediction (or classification) higher than the professional doctors?
I have a question for vector representations and image embeddings: in the sample code we used the pre-trained resnet model and got a 512 dimensional image vector as result; practically, how should we choose the number of dimensions in the image embeddings, and how does this choice related to the dimensions of the image representations?
I'm interested in learning more about style transfer within domains - image2image - and across domains - image2sound. What would an example of image2sound look like?
Is there any algorithm that are common used for classfication and object dectetion of the satellite image? For labeling and collecting the satellite image, is there any trick that we can apply?
I didn't find the link to the oriental reading on canvas, so I am going to post a question using my imagination:). I am wondering that as a synthesis of a bunch of audios and pictures, how and what can we do with videos using deep learning?
I don't have any specific questions about image and sound learning in my head mow. Look forward to hearing other peers' discussions in the lecture.
I was wondering beyond identifying particular agents in a picture whether it was possible to ascertain relationships amongst people in an image - i imagine it'd look something like a microexpression on one's face in reaction to someone else and then using that to make inferences about dynamics within a picture.
Post your question here about the orienting readings: “Image Learning” & “Audio and Video Learning”, Thinking with Deep Learning, Chapters 13 & 14.