Open alexwlchan opened 6 months ago
Example: https://commons.wikimedia.org/wiki/File:%22Air_Cav!%22.jpg
This file links to https://www.flickr.com/photos/35703177@N00/28006848539/, but the photo page is a 404 and the user page is a 410.
In this case the bot adds the P12120 (Flickr photo ID) and P7482 (source of file statements), but it can't add any of the other Flickr metadata.
It might be useful to add the Flickr user ID in P170 (creator), but it's not essential.
The date on WMC is 11 January 2018, 05:27:39
, which is the created date in the EXIF of the JPEG file, but that's not when the photo was actually taken – more likely when it was digitised.
The Flickr photo has the actual date: September 1965
.
I don't know how widespread this is, but this is the sort of thing we should fix.
Example: https://commons.wikimedia.org/wiki/File:Strawberries_time-lapse.ogv Example: https://commons.wikimedia.org/wiki/File:Lascar_VIDEO_-_Riding_the_Budavari_Siklo_to_the_Castle_Hill_top_(4543574073).jpg
This throws an exception when we try to retrieve the image info:
Traceback (most recent call last):
File "/Users/alexwlchan/repos/flickypedia/.venv/bin/flickypedia", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alexwlchan/repos/flickypedia/src/flickypedia/backfillr/cli.py", line 312, in update_single_file
run_with(list_of_filenames=[filename])
File "/Users/alexwlchan/repos/flickypedia/src/flickypedia/backfillr/cli.py", line 93, in run_with
photo = flickr_api.get_single_photo(photo_id=photo_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alexwlchan/repos/flickypedia/.venv/lib/python3.12/site-packages/flickr_photos_api/api.py", line 371, in get_single_photo
"width": int(s.attrib["width"]),
^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: ''
This is a bug, because we don't even use the width
value that's being extracted.
Example: https://commons.wikimedia.org/wiki/File:%22A_Welcome_Visitor_to_Camels%27_Paradise%22.jpg
The date on Flickr is circa 1922
, but it's been mapped to WMC as 1 January 1922
. This looks like a bug in the original migration tool – Flickr returns a 1 Jan timestamp in the "date taken" field and stores the granularity separately. This is something we should be able to fix automatically.
This seems like a fairly obvious thing to do which I've only added now; I'm now tracking which properties have an unknown action and the associated files.
This is a tracking ticket to highlight files (with examples!) where the bot is getting "confused" and doesn't know how to update the SDC.