bitdruid / python-wayback-machine-downloader

Query and download archive.org as simple as possible.
MIT License
33 stars 2 forks source link

Any filetype is named index.html if the directorie does already exist [possible solution] #7

Closed bitdruid closed 5 months ago

bitdruid commented 5 months ago

Describe If a file exists example.com/subdir.jpg/picture.jpg and an image example.com/subdir.jpg is downloaded, the second image will be stored as example.com/subdir.jpg/index.html and is not readable.

Snapshot to reproduce https://web.archive.org/web/20240106164401id_/https://attachment.mcbbs.net/uc_server/data/avatar/001/74/40/69_avatar_big.jpg https://web.archive.org/web/20220124000717id_/https://attachment.mcbbs.net/uc_server/data/avatar/001/74/40/49_avatar_big.jpg/small

Command to reproduce waybackup -u attachment.mcbbs.net/uc_server/data/avatar/001/74/40 -c --end 20240117

Possible Solution Check the file with python-magic and rename it according to the mimetype.