Conversation about duplicate images

dukechronicle / chronline

The official repository for the Duke Chronicle website

http://www.dukechronicle.com

202 stars 21 forks source link

Conversation about duplicate images #187

Closed jimpo closed 10 years ago

jimpo commented 10 years ago

There's a lot of them and I don't like it.

We could store the hash of the original and put a unique index on it. Thoughts? @grivkees @tylernisonoff @themichaellai

This is slightly complicated by the fact that images are attributed as file photo or not depending on the article.

grivkees commented 10 years ago

Don't think that will work, since the hash would likely be different because the uploaded original would not be exactly the same because they precrop.

If we made a way for them to search images they wouldn't have to do it.

Also, I don't think it really matters, we pay mostly for viewing bandwidth not really storage costs.

On Wed, Feb 26, 2014 at 3:10 PM, Jim Posen notifications@github.com wrote:

There's a lot of them and I don't like it.

We could store the hash of the original and put a unique index on it. Thoughts? @grivkees https://github.com/grivkees @tylernisonoffhttps://github.com/tylernisonoff @themichaellai https://github.com/themichaellai

This is slightly complicated by the fact that images are attributed as file photo or not depending on the article.

Reply to this email directly or view it on GitHubhttps://github.com/dukechronicle/chronline/issues/187 .

jimpo commented 10 years ago

I disagree, I think the hash will be the same. We can always just try an experiment to see how many duplicate hashes we have.

I agree that's it's no a huge issue though. The duplicate images just bother me sometimes on staff pages and stuff. Also, articles sharing an image is a very strong indication that the articles are related (valuable information) which we currently don't have access to.

grivkees commented 10 years ago

If they change the crop by one px, or if they change a different jpeg compression ratio, the hash will be different. Unless you are talking about some crazy hash that can work around that.

On Wed, Feb 26, 2014 at 4:34 PM, Jim Posen notifications@github.com wrote:

I disagree, I think the hash will be the same. We can always just try an experiment to see how many duplicate hashes we have.

I agree that's it's no a huge issue though. The duplicate images just bother me sometimes on staff pages and stuff. Also, articles sharing an image is a very strong indication that the articles are related (valuable information) which we currently don't have access to.

Reply to this email directly or view it on GitHubhttps://github.com/dukechronicle/chronline/issues/187#issuecomment-36180170 .

jimpo commented 10 years ago

If it's a file photo, I assume they just look it up on a server and upload the same one.

grivkees commented 10 years ago

I would believe they store the originals only and not the cropped and resized ones.

Also, they often crop photos for look first before uploading. So a slightly different crop would change that.

On Wed, Feb 26, 2014 at 8:45 PM, Jim Posen notifications@github.com wrote:

If it's a file photo, I assume they just look it up on a server and upload the same one.

Reply to this email directly or view it on GitHubhttps://github.com/dukechronicle/chronline/issues/187#issuecomment-36201316 .

jimpo commented 10 years ago

Let's ask @thanhha92

thanhhahaha commented 10 years ago

We usually look up the original photos and recrop them so they would be different by your definition of different photos. But @jimpo where are you seeing these duplicated photos? I'm under the impression that we try really hard to not reuse the same photo.

Also, search for images = wonderful idea. I think I'm the only one who knows how to get the original photo from the database when signed into admin.