cc-archive / open-ledger

Prototype code and examples for work on the Creative Commons "CC Search" project
MIT License
48 stars 23 forks source link

Multi-word strings do not behave as expected in search #164

Closed little-wow closed 7 years ago

little-wow commented 7 years ago

Multi-word strings aren't always working as expected, particularly when using search delimiters.

For example table with food in cultural works: https://ccsearch.creativecommons.org/?search=table+with+food&page=1&per_page=20&search_fields=title&search_fields=creator&search_fields=tags&work_types=cultural&providers=met&providers=nypl&providers=rijksmuseum

Still life: https://ccsearch.creativecommons.org/?search=still+life&page=1&per_page=20&search_fields=title&search_fields=creator&search_fields=tags&work_types=cultural&providers=met&providers=nypl&providers=rijksmuseum

Van Gogh: https://ccsearch.creativecommons.org/?search=van+gogh&page=1&per_page=20&search_fields=title&search_fields=creator&search_fields=tags&work_types=cultural&providers=met&providers=nypl&providers=rijksmuseum

"Vincent Van Gogh" https://ccsearch.creativecommons.org/?search=%22vincent+van+gogh%22&page=1&per_page=20&search_fields=title&search_fields=creator&search_fields=tags&work_types=cultural&providers=met&providers=nypl&providers=rijksmuseum

lizadaly commented 7 years ago

It will be an ongoing process to identify exactly how search should behave, as expectations vary (both across individuals and from search-to-search). In the initial beta I left the behavior relatively undefined, so CC will need to make some decisions going forward:

If I search for "table with food" and no results have all three words, do we return no results? It may be better to weight results with multiple terms higher but still return results without all of them, or maybe not! The more a user filters their search (e.g. by provider), the less likely they will be to get any results at all if the multi-term requirements are very strict.

"With" is typically considered a "stopword" and is not considered in searches. What should the stopword list be? Is "food on table" the same as "table with food"?

If John Gogh has a photo title 'Van', is that returned for a search on "van gogh"?

little-wow commented 7 years ago

Here's a post that goes into some of these issues from a user: http://www.rba.co.uk/wordpress/2017/02/10/new-creative-commons-image-search-back-to-the-drawing-board-im-afraid/

lizadaly commented 7 years ago

I have implemented the search syntax change (as supported by Elasticsearch's Query String Syntax.) Along with #178, this means search behavior now:

Note that table with food in cultural works does not return many results now, because we do not have images with that metadata.