Open hyanwong opened 2 months ago
Here's another test using get_wiki_images clade OneZoom_latest-all.json Palaeognathae
For comparison, here's what we have for that clade on OneZoom:
A lot of these images don't have the artist/author information in a format we can ingest easily, e.g.
WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_duidae.JPG': using 'Unknown artist' WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_obsoletus.jpg': using 'Unknown artist' WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_strigulosus.jpg': using 'Unknown artist' WARNING:get_wiki_images.py:Artist not found for 'Tinamus_solitarius.jpg': using 'Unknown artist' WARNING:get_wiki_images.py:Artist not found for 'Tinamus_guttatus.JPG': using 'Unknown artist' WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_parvirostris.JPG': using 'Unknown artist' WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_noctivagus.JPG': using 'Unknown artist' WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_undulatus.JPG': using 'Unknown artist' WARNING:get_wiki_images.py:Artist not found for 'Nothura_minor.jpg': using 'Unknown artist'
These usually have e.g. "Given to the wikipedia by the author, Renato Caniatti" or something similar written on the page. I assume that someone will figure out a way to make this a bit more machine readable, and we just have to wait until this is sorted.
My impression is that the wiki images are of roughly the same quality on average (maybe very slightly better) than what we have, but that the image rating of the existing images means that our existing image stock is probably a bit more useful, because we can pick the ones we know to be high quality for percolating upwards in the tree.
Finally, here are pill bugs (get_wiki_images clade OneZoom_latest-all.json Armadillidiidae
). OneZoom only have 2 images in this taxon, so the 13 images that we can get from wikidata is a distinct improvement, and the pictures are all pretty good quality, I think:
I tried experimenting with the automatic wiki harvester, using armadillos as a test case:
Here are the pictures. They aren't quite as good quality as I would have hoped for, but that might be a reflection on the unusualness of the taxon. There are some better wikimedia images (e.g. ), but it does appear that some hand curation might be needed for some of these non-european groups. I guess the main question is whether assigning all these image a value of 35000 will displace existing, better Onezoom images on the tree: