chnm / serendipomatic

http://serendipomatic.org/
26 stars 9 forks source link

Relevance of images from Flickr Commons seems low #48

Closed patrickmj closed 11 years ago

patrickmj commented 11 years ago

This falls into the category of pre-usertesting, so please sort priority accordingly

Sorry if I'm jumping the gun on user testing, but I couldn't resist giving it an early whirl.

Looks like many completely different searches produce inexplicable duplicate results coming from Flickr Commons -- random text strings and different wikipedia articles seem to return the same images from it no matter what.

Might be able to duplicate by pasting in a bunch of text from unrelated wikipedia articles and looking for repeated images, and manually confirming repeats are from Flickr Commons.

First glance looks like by default Flickr returns generously, regardless of relevancy, and that the default sort is by date posted.

If above steps confirm, might try adding an additional param to

results = flickr.photos_search(text=' OR '.join(set(keywords)), format='json', is_commons='true')

to include sort='relevance', and only grab a limited number from the top.`

(Tangentially related, the sort by interestingness-desc value to sort might be an additional way to tweak results when we get to more user testing)

Keep up the awesomeness! You are rocking the hell out of this!

mialondon commented 11 years ago

I'm wondering if it's because Flickr is returning results when other aggregators aren't? The result set should be fairly mixed. Could you try some of your search terms in the others to see if that's a likely factor? It's tricky because we're playing with

And all testing is so incredibly useful, thank you!

rlskoeser commented 11 years ago

that's my guess (flickr returning results when other apis aren't). should be easy to add logging to see the number of items found in each api to confirm

rlskoeser commented 11 years ago

relevance sort was added, and number of items limited to be closer to other sources