bbolli / tumblr-utils

Utilities for dealing with Tumblr blogs, Tumblr backup
GNU General Public License v3.0
667 stars 124 forks source link

Archive images load from tumblr instead of using /media locally #236

Open BCD413 opened 1 year ago

BCD413 commented 1 year ago

I was running a standard full backup using [892e28d] with the standard python tumblr_backup.py [blog name] command. Then when i loaded up the archive, i noticed that it loads slower [as in, line-by-line], as if it's loading from somewhere other than my local drive.

When i checked where the image was loading from, i noticed that it was loading directly from tumblr with the 64.media.tumblr.com link, rather than using the locally-stored files as it did in my past backups using older versions [July 2021].

Mind looking into this?

cebtenzzre commented 1 year ago

This is only supposed to happen for images that couldn't be downloaded unless you use -k. Did you see any errors while you were making the backup? If you back up the same posts using my fork (you can use e.g. -n 100 to back up just a few) do you see the same issue, or any different errors?

veloskies commented 9 months ago

I have essentially zero coding experience so please be VERY explicit with instructions.

Just downloaded all the relevant scripts etc last night and while experimenting this morning found that inline images are being loaded from their 64.media.tumblr link rather than the /media file. Double checked to make sure they were downloaded and they are present in the /media file, they just don't load from there. Non-inline images load locally.

Not sure if this error is relevant: 'warning: filetype module not found, using deprecated imghdr'

I did try using your specific fork as instructed and got the same results.

EDIT: Further experimentation shows that when the post is opened via its specific post link in /posts it loads the images locally, but when it's opened from the index or the archive it loads them from 64.media.tumblr.

cebtenzzre commented 8 months ago

I haven't observed this in my backups. What does the HTML look like for the one that loads via 64.media.tumblr.com vs one that doesn't? If it's just a regular img src= and you are making a fresh backup with my fork then I don't see how there could be a difference.

veloskies commented 8 months ago

The backup is completely fresh, I've never done this before. I have had a look and the html is different.

This loads from backup:

media1

This does not, and was all on one line along along with six more formatted the same way:

media2

Both came from the same backup, but if it helps, the first one was from a post in 2015, and the latter from a post last month.

If you need more specific info please let me know!

cebtenzzre commented 8 months ago

This does not, and was all on one line along along with six more formatted the same way:

Thanks for that information. I've seen that in HTML before but didn't realize the browser was ignoring the src attribute and using srcset instead - but it makes sense that it would do that. tumblr_backup.py isn't currently aware of the srcset attribute AFAIK.