TheScienceMuseum / collectionsonline

Science Museum Group Collection Online
https://collection.sciencemuseumgroup.org.uk
MIT License
47 stars 3 forks source link

Certain category names cause the API to eliminate all params and filter on category #1115

Open zenlan opened 6 years ago

zenlan commented 6 years ago

For instance: https://collection.sciencemuseum.org.uk/search?q=art

Categories: Materia Medica & Pharmacology Surgery Railway Posters, Notices & Handbills Photographs Photographic Technology Classical & Medieval Medicine Art Wellcome Medals Therapeutics Documents

Not all category names cause this behaviour, 'art' does, 'photographs' does while 'surgery' does not.

This is causing issues in 2 projects of mine where I expect results for the query term and cannot handle category results.

Results from logs of one project's API calls where the URLs are spawned by paginator buttons, each page returns exactly the same result set i.e. that of a category search:

[18-02-05 11:16:40:133 GMT] URL [18-02-05 11:16:40:134 GMT] "https://collection.sciencemuseum.org.uk/search/objects/images?page[number]=0&page[size]=50&q=art" [18-02-05 11:16:41:353 GMT] id: co27959 [18-02-05 11:16:41:387 GMT] id: co8023087 [18-02-05 11:16:41:416 GMT] id: co8023088 ....... [18-02-05 11:16:42:841 GMT] id: co65431 [18-02-05 11:16:42:872 GMT] id: co67231 [18-02-05 11:18:20:938 GMT] URL [18-02-05 11:18:20:938 GMT] "https://collection.sciencemuseum.org.uk/search/objects/images?page[number]=1&page[size]=50&q=art" [18-02-05 11:18:22:157 GMT] id: co27959 [18-02-05 11:18:22:187 GMT] id: co8023087 [18-02-05 11:18:22:219 GMT] id: co8023088 ....... [18-02-05 11:18:23:749 GMT] id: co65431 [18-02-05 11:18:23:785 GMT] id: co67231

A second project has a page that merges results from a range of museum APIs, getting 5 from each. It was my misfortune to select the word 'art' as the default search term which leads to the Science Museum results flooding the page, outnumbering all other results. Also repetition of the first set of results for every subsequent call.

jamieu commented 6 years ago

Ah, yes...this is likely a HTML only feature creeping into the API/JSON queries/response, will take a look.

But is there a reason your using q=art over searching the art category specifically /search/objects/categories/art or searching for specific object types ie. /search/objects/object_type/oil-painting.

Using the q= will return you anything that matches the word art(including fuzzy matches), rather than objects that are actually categorised as art, is that really what you want?

zenlan commented 6 years ago

The URLs in the log records I pasted show the actual queries that I use, i.e. exclusively /search/objects/images. The first query was just an adhoc example.

jamieu commented 6 years ago

I think there maybe two separate issues here:

zenlan commented 6 years ago

I am aware of the pagination issue and it does not affect this issue. I limit the queries to pages 0 - 9.

Even though the collections search page allows to search pages 0 - 10 it seems. https://collection.sciencemuseum.org.uk/search/objects/images?page[number]=10

zenlan commented 6 years ago

Btw there is no harvesting, my projects are search apps. I don't store any data.

jamieu commented 6 years ago

Yes, as I explained we 'hijack' the queries for category names and treat those queries differently (effectively sending you off to a different results page). I need to turn that off for the JSON/API query/response. Unlikely to get to it today, but will look at it this week.

As for the pagination issue, I've attached a list of records with images, in the short term it's probably easier for to modify your app to use a local copy of this data. Although we do add new records/images, the frequency isn't so regular that you'll be missing vast numbers of records.

smg-objs-with-images.txt

zenlan commented 6 years ago

Thanks but I just found a workaround for this, I think. If I wrap the term in quotes, either single or double, the query remains intact.

https://collection.sciencemuseum.org.uk/search/images?q=art https://collection.sciencemuseum.org.uk/search/images?q="art"

https://www.zenlan.com/collage/science/#art https://www.zenlan.com/collage/science/#"art"

This workaround will suffice until there is a proper resolution.