WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
251 stars 202 forks source link

Create a guideline for selecting the creator and audit provider scripts #1491

Open obulat opened 2 years ago

obulat commented 2 years ago

Description

Some provider scripts (such as the Science Museum, which first raised the issue) can have several creators, and we need to select the correct option.

Science Museum can list several creators in the API. For example, a photograph of a replica of an object would have 3 creators: the original creator, the creator of a replica, and the photographer. For Openverse, I believe the creator should be the photographer, as that is the creator of the image that is marked with the specific license (or is in public domain). We can (and I think we should, to improve search relevancy) add the other creators as separate fields in the meta_data fields.

Examples from the original discussion

Originally posted by @stacimc in https://github.com/WordPress/openverse-catalog/issues/576#issuecomment-1198611794 Here's an example where the record is a (delightful) photograph: https://collection.sciencemuseumgroup.org.uk/objects/co8210526/timmy-the-cat-guardian-of-mr-browns-prize-budgerigars-gelatin-silver-print . In the json you can see that the maker will correctly attribute the photographer: https://collection.sciencemuseumgroup.org.uk/api/objects/co8210526

And then for @obulat's example, here's the API response: https://collection.sciencemuseumgroup.org.uk/api/objects/co56202. This one is several layers deep, because it's a photograph of a replica of an object created by Galileo Galilei, but he is the only source listed under maker. However in the type field, the maker type is labeled first as designer and also maker (original). Maybe we could use this to detect maker types that aren't true sources we want to identify as 'creator'.

The Board of Trustees is in that API response under source.legal.rights, so perhaps that could be a field we fall back to if there isn't an appropriate maker type.

Where possible we should probably keep these refactors as small as we can, though. But this is really interesting and definitely worth addressing -- I'm glad you noticed @obulat, good eye!

Note unrelated to this issue

It might also be a good idea to dump all text-based pieces of information such as Maker: Galileo Galilei; Replica Maker: xxx into a single field like description_for_search_purposes (it will probably not be readable and we shouldn't add it to the front end where we put description, but it will have data that would be nice to index to improve the search relevancy.

AetherUnbound commented 2 years ago

I agree, it sound like we should have multiple "tiers" with fallbacks as we look for the creator within the Science Museum metadata! I like the indexed description idea too, that sounds like it's also something we should look at down the line!