TumblThreeApp / TumblThree

A Tumblr and Twitter Blog Backup Application
https://TumblThreeApp.github.io
MIT License
607 stars 73 forks source link

.pnjs and .pngs #231

Closed nuvema572 closed 2 years ago

nuvema572 commented 2 years ago

tumblr tends to serve image previews as ".pnj" files sometimes for some reason lately, and tumblthree downloads them (and sometimes also a .png version.)

the problem is that .pnjs are often compressed and also can't be opened by a lot of image viewers. is there some way to ensure that only the original, full quality .pngs are downloaded?

also, a general feature request: if a reblog, an option to add the "date created" metadata as the original time of the post's creation (not reblog), and better ability to sort into folders (i.e., grouping by original poster name>post id)

thanks for making tumblthree the best it can be! ^^

thomas694 commented 2 years ago

Thanks for your note and feature suggestion.

PNJ is a MNG sub-format for encapsulating JPEG images. On Tumblr I've never seen normal images as pnj until now. Do you mean those avatar images? Normally TumblThree isn't downloading them. Tumblr needs different sized user images on various locations on their site. e.g. https://64.media.tumblr.com/avatar_71e489b9e807_128.pnj https://64.media.tumblr.com/avatar_71e489b9e807_64.pnj https://64.media.tumblr.com/avatar_71e489b9e807_16.pnj

I can't say if Tumblr is really storing them in one file, but I guess it's only a url rewrite thing (similar to gifv). A browser gets back the information that it's a .jpg file (and also saves it like that). If normal images are affected by this, maybe we should adapt TumblThree to not rely on the file extension from the url, but just save these files with the "real" extension (jpg).

Regarding your reblog suggestions. For getting the original post's creation time we would need to make each time another request to load the original post, if it still exists, just to get its creation time. I don't think that's efficient. Why do you need the original creation time? If someone re-uploads instead of reblogging, as it happens many times, it's even a new "origin".

Do you mean by "better ability to sort" a new filename template token for the original poster name? That would be possible as we have that information on hand.

Good you like it, give it a star.

nuvema572 commented 2 years ago

as for the .pnj situation, it seems like it's only for some .pngs including normal images, and i'm honestly unsure if it's actually just an url rewrite or not. i only started noticing it around last month.

edit: according to changes, it says "On web, PNGs are now served as JPEGs if that path is determined to be more optimal. This will help images load more quickly without any loss in image quality 👍️" so, still unsure if this means that it's just an url rewrite. i've tried renaming downloaded .pnjs to .png and the transparency is preserved, but sometimes the size of the file (in kb, not dimensions) is different tumblthree has downloaded files in both .png, .jpg, and .pnj formats before, but i genuinely can't tell if it's being compressed or not.

as for the suggestions, a template token would be just perfect, and perhaps an ability to sort files by the token into certain folders, like how this extension does, if possible? if that's not doable that's totally OK ignore my comment about original post time, too, then, if it's too difficult- i like to use tumblthree to help back up archivization projects, and dating files is something that i can do manually just fine. ^^ thanks!

thomas694 commented 2 years ago

Yes, I found .pnj files(/urls) myself now. Normally "pnj" only exists on the server resp. in the url, at the time of download the browser gets back a png or jpg, but TumblThree doesn't handle that correctly yet. They seem to play around with this since March. But as it seems they aren't doing it only for new uploads/posts, but also for some old posts that you download now.

I have overlooked that info in changes announcements. And "without any loss in image quality", so you have to trust them that their algorithm decides correctly. As a side note, so far every pnj jpeg I examined has a jpeg compression level of 92. And yes, there are .pnj files that actually contain png files, obviously mostly because of transparency. For easier identification, I wouldn't rename them first and look for png transparency, but open them in a text editor (e.g. windows notepad) and look on the first 4 characters (either something like "‰PNG" or "ÿØÿà").

But (since this week?) there exists another style. The extension in the url is .png (not .pnj), and after the download you have either a png or jpg image.

@ all They say "On web, PNGs are now served as JPEGs". So the opposite is their mobile apps? Is someone using them and can confirm that there you can still download these images as png while in the desktop browser they are offered as jpeg? If so, a network traffic log for that image request would be even more helpful.

thomas694 commented 2 years ago

We added/fixed the 'easy' parts, the new filename template token for reblog origin's blog name and saving 'pnj' files with their real file extension (png / jpg).

Khodyn commented 2 years ago

It seems now that most of the time TumblThree downloads .pnj.jpg files instead of the original png. This is incredibly frustrating as I can get the higher-quality png manually, on desktop.

Changing .pnj to .png at the end of the URL seems to work fine.

thomas694 commented 2 years ago

Then it should be possible for us to download the png too. Maybe we should add an option to let the user choose, either the original png or the Tumblr converted jpg that saves storage space.

Out of curiosity, do you have any ".pnj.jpg" that's visibly not the same as the png? Can you give me the link to the post or the image link? If content is inappropriate to post here, send me an email or use the feedback dialog.

Khodyn commented 2 years ago

Out of curiosity, do you have any ".pnj.jpg" that's visibly not the same as the png? Can you give me the link to the post or the image link? If content is inappropriate to post here, send me an email or use the feedback dialog.

Every .jpg will look worse. Lossy compression is a horrible thing for art. Look at the lines on any jpg compared to the png and you can clearly see the crust.

imageimage

thomas694 commented 2 years ago

My two cents why this comparison is flawed. Right now they are two pngs again, there was a second conversion, jpg metadata/information was lost, compared at a high (10x) zoom level.

Png is a lossless format and (normally) jpg is a lossy format. So by its nature, a jpg compared to png is never binary equivalent and more or less information is lost. Of course, I know how to use image software to extract/show the smallest differences between two images or look at them on a high zoom level. But that is not what is perceived.

The question was: Do you have any Tumblr image at hand that is currently offered as pnj where the png is visibly not the same as the jpg (viewed on the monitor resp. at zoom level 100%)? (A manual conversion to jpg for comparison here would need to be done with a quality level of 92.) I asked, because I don't have one at hand and wondered if there are thousands of wrongly converted png images on Tumblr and I just cannot find any of them.

Khodyn commented 2 years ago

Of course, I know how to use image software to extract/show the smallest differences between two images or look at them on a high zoom level. But that is not what is perceived.

It doesn't matter if you can't perceive the difference, even a single pixel of image degradation sucks and should be avoided when possible.

thomas694 commented 2 years ago

@Khodyn Thanks for your code contribution.