bbcarchdev / acropolis

A toolkit for building knowledge graphs
https://bbcarchdev.github.io/acropolis
8 stars 6 forks source link

Search for ?media=image can return resources with no images #1

Open townxelliot opened 7 years ago

townxelliot commented 7 years ago

(Opening against Acropolis as I'm not 100% sure where in the stack the issue is, or even if it's an issue.)

Searching for keywords with media=image set can sometimes return resources which don't have any images associated with them through mrss:player, mrss:content or similar.

To reproduce:

  1. Search Acropolis: http://acropolis.org.uk/?offset=0&limit=25&q=judi+dench&media=image

  2. Fetch one of the resources returned by the search as Turtle: http://acropolis.org.uk/a75e5495087d4db89eccc6a52cc0e3a4.ttl#id

  3. Check the RDF for mrss:content or mrss:player URIs. There aren't any.

  4. Search the whole text of the RDF for '.jpg', '.gif', '.png', '/player'. None are present.

There is the statement

<http://shakespeare.acropolis.org.uk/images/4040698#id> foaf:depicts <http://dbpedia.org/resource/Judi_Dench>

in the RDF, and

<http://dbpedia.org/resource/Judi_Dench> owl:sameAs <http://acropolis.org.uk/a75e5495087d4db89eccc6a52cc0e3a4.ttl#id>

Is this what's causing the resource to be included in the search results?

Would it be worth modifying the response to only include resources with mrss:player or mrss:content statements on them?

If not, could you please explain the algorithm used for choosing which results are returned for media queries?

townxelliot commented 6 years ago

I think this is related to #8: some of the resources returned for the search are creative works, rather than resources which have related creative works (which in turn have related media). Could this again be down to creative works being about themselves?