cc-archive / cccatalog

[PROJECT TRANSFERRED] Mapping the commons towards an open ledger and cc search.
https://github.com/WordPress/openverse-catalog
MIT License
63 stars 60 forks source link

[Feature] Retrieving sub providers within Flickr #419

Closed ChariniNana closed 4 years ago

ChariniNana commented 4 years ago

Problem Description

Retrieve sub providers such as NASA within Flickr, which are useful to a broader audience.

Solution Description

To address this requirement, we first need to identify the sub providers required to be filtered. Once the sub providers are identified, the next step is to find all the user accounts which are associated with a given sub provider (For example, both 'NASA HQ PHOTO' and 'NASAKennedy' user accounts belong to the sub provider nasa). We can have a mapping of a sub provider to one or more user IDs to identify which users come under which sub provider. Subsequently, at the image processing level, we can determine whether an image comes under any of the sub providers in the mapping, based on the image user ID. If so, the source field can be set to the corresponding sub provider value, or the default provider value 'Flickr' otherwise.

Alternatives

The initial plan was to generate separate image stores for each sub provider and default provider, which eventually generates seperate tsv files for each provider value. However, the solution was later updated as above.

Additional context

This is related to issue #392