Images, Art & Video (E1) - Doersch et al 2012

HyunkuKwon commented 4 years ago

Post questions about the following exemplary reading here:

Doersch, Carl, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei Efros. 2012. “What makes Paris look like Paris?”

wanitchayap commented 4 years ago

From the social science perspective, it would be very interesting to link this study back to humans. That is I wonder to what extent the algorithm detecting geolocation from the scenario is similar to how people doing so. The paper touched a bit on this and actually use some human standards to validate the algorithm. However, these standards (artists, architecture textbook, experimental task) might not be the representative of our everyday perceptions. Do we actually look for discriminative features and ignore non-discriminative ones? Do we use other information too? I think the algorithm present here sounds very intuitive and a good candidate to how our perception works. However, I think we still need a well-designed study that integrates this study with the goal of understanding human perception.

iarakshana commented 4 years ago

The paper seems to assume that there are some intrinsic characteristics of specific cities that make them different from other places. If this is true, then is studying Google Streetview the best way to distinguish these characteristics? Could this e expanded to add different aspects of personality? - common sounds potentially?

tianyueniu commented 4 years ago

It was a little surprising to see the effectiveness of the author's approach, given that they eventually chose to randomly sample and train their algorithm on 'patch' of images instead of more granular features. Is there a way to combine low-level features with high-level features in image-recognition to make an algorithm perform better?

DSharm commented 4 years ago

The authors point out a big critique of this method - the algorithm struggled with American cities: "it was able to discover only a few geo-informative elements, and some of them turned out to be different brands of cars, road tunnels, etc. This might be explained by the relative lack of stylistic coherence and uniqueness in American cities (with its melting pot of styles and influences), as well as the supreme reign of the automobile on American streets."

However, this isn't a one-off issue - I would guess if this method was applied to pretty much any city in the "developing" world, we would run into similar issues. In particular, this method seems limited to cities that are a) extremely old and b) have preserved their features, unlike a lot of developing world cities which are a similar melting pot of styles and influences (many informed by the styles of their colonizers).

Additionally, these methods may not even continue to apply to old European cities - most of these cities now have an "old" city that is preserved, surrounded by a modern urban sprawl that looks basically like every other city in the world.

Is it then fair to think that the usefulness of this methodology would decline over time as cities continue to grow, change, and become affected by different influences? Or, would we think that with enough time and data, the models can become more sophisticated and pick up on nuanced differences that the human eye can't easily detect?

jsgenan commented 4 years ago

Identifying the style of a city differ from unsupervised deep learning from the last lecture in that we want to pick "patterns that are both frequently occurring within the given locale, and geographically discriminative" and also be able to interpret the model. Iterated SVM model works surprisingly well in computational vision. However, as previous comments have pointed out, we are applying this model only when we know there is some distinctiveness between the patches of images. What would be a good example of distinctivenesses so subtle that human eyes cannot articulate but machinery could detect with finnest nuance?

bazirou commented 4 years ago

Interesting paper with many results.

Computational-Content-Analysis-2020 / Readings-Responses-Spring

Images, Art & Video (E1) - Doersch et al 2012 #39