OneZoom / tree-build

Scripts for assembling the tree, metadata and downstream data products such as popularity and popular images
MIT License
1 stars 2 forks source link

Additional wiki functionality: getting a list of alternative images from commons #75

Open hyanwong opened 3 months ago

hyanwong commented 3 months ago

As the risk of feature creep, one very useful wiki API function would be to take the wikidata qID and find the wikimedia commons category from the wikidata API (this is P373), then get a list of thumbnail image URLs of all the images in that category on commons. This would allow us to make a page where you could pick alternative bespoke images to harvest.

I'm guessing that it would be useful to roll that functionality into the get_wiki_images.py file, although it isn't necessary for the CLI use of that file. I'm tending to think of get_wiki_images.py as more like a set of library routines, however.

davidebbo commented 3 months ago

Note that P373 can have multiple categories, although in many cases it's just one. There is also P910 which could be useful: "topic's main category".

Category images are probably not in the dump, or maybe it's in a different dump. But since we'd be dealing with one image at a time, we may as well just call an API.

hyanwong commented 3 months ago

Category images are probably not in the dump, or maybe it's in a different dump. But since we'd be dealing with one image at a time, we may as well just call an API.

Yes, I think it would be an API call to wikimedia.

davidebbo commented 3 months ago

For reference, I found the call to make. e.g. for Lion: https://commons.wikimedia.org/w/api.php?action=query&prop=images&titles=Category:Panthera_leo&imlimit=500&format=json&utf8. Though it only returns 18 images, while https://commons.wikimedia.org/wiki/Category:Panthera_leo has 361. So not sure what's going on here.

It would be quite easy to have a command that lists all the category images for an ott. It could even return it formatted as HTML so you can open it locally, but I'm not sure that buys much, since you may as well go to the wikimedia category page and view them there.

Now if you want UI that is served by web2py and can handle taking a user selection and making the update, it's going to be more work as it needs a backend component.

hyanwong commented 3 months ago

Yes, I think it would be an API call to wikimedia.

For example, if we go to our classic example https://www.wikidata.org/wiki/Q140 (lion), we get https://www.wikidata.org/wiki/Property:P373 = https://commons.wikimedia.org/wiki/Category:Panthera%20leo, so we can look up:

https://commons.wikimedia.org/w/api.php?action=query&list=categorymembers&cmtype=file&cmtitle=Category:Panthera%20leo&cmlimit=max

And return all the images from that API call. To convert those image names to thumbnails for viewing, we could,, I suppose, follow https://stackoverflow.com/questions/33689980/get-thumbnail-image-from-wikimedia-commons.

hyanwong commented 3 months ago

There is also the "commons gallery" for that taxon (https://www.wikidata.org/wiki/Property:P935), which point in the lion case to https://commons.wikimedia.org/wiki/Panthera%20leo - I believe this is a set of hand-collected images of that taxon, and so is sometimes a nicer curated example of a subset of pictures. We could probably specify whether to get P373 or P935, or both.

hyanwong commented 3 months ago

For reference, I found the call to make. e.g. for Lion: https://commons.wikimedia.org/w/api.php?action=query&prop=images&titles=Category:Panthera_leo&imlimit=500&format=json&utf8. Though it only returns 18 images, while https://commons.wikimedia.org/wiki/Category:Panthera_leo has 361. So not sure what's going on here.

Sorry - I only just saw that we were trying exactly the same thing!

I think that you want list=categorymembers&cmtype=file&cmtitle=Category:Panthera%20leo&cmlimit=max, rather than prop=images. I suspect that the prop=images specification only include images embedded by hand in that page, rather than those auto-generated by category membership.

davidebbo commented 3 months ago

Indeed, this gives me 361 images: https://commons.wikimedia.org/w/api.php?action=query&list=categorymembers&cmtype=file&cmtitle=Category:Panthera%20leo&cmlimit=max&format=json