Issues with parsing some tokens on booru.org subdomains

Danimos552 commented 5 years ago

Bug description

I am aware that I had posted a similar issue before, but Grabber still has issues with parsing these tokens on the booru.org subdomains: %date%, %filename%, %md5%, %source%, %height% and %width%. The rule34.xxx (aka rule34.booru.org) as well as other boorus seem to work flawlessly. The issues appear only on the booru.org subdomains.

The tokens %date%, %source%, %height% and %width% are not saved at all and the log files appear entirely empty. It’s true that many pictures on the booru network are posted without a source, but it’d still be nice to have it whenever it’s available.

The real issues are with %filename% and %md5% tokens. Booru network seems to use some custom format for filenames, which I believe is not a checksum but a randomly generated string. I happened to see two pictures with two different filenames but the exact same md5 sum, hence my theory. The thing is that Grabber is unable to correctly retrieve such an image’s md5 sum and it mistakes the file’s name (as stored on the server) for its md5 sum. When you save both %filename% and %md5% tokens to a log file they will be identical, but if you calculate the image’s checksum yourself (using 3rd party software) the sum is different. Grabber saves image’s filename in place of its md5 on the booru.org subdomains. This issue does not appear on other boorus (including rule34.xxx).

Steps to reproduce

Go to Tools --> Options --> Save --> Separate log files
Click on 'Add a separate log file' and fill it with desired tokens (%date%, %filename%, %md5%, %source%, %height% and %width%)
Save the options
Choose a booru.org subdomain from the sources (eg. furry.booru.org)
Click get this page
Go to Downloads tab and click Download button
Once the batch is finished, go to the save folder, open a log file
Tokens %date%, %source%, %height% and %width% are empty and the %md5% is identical to %filename%.

Expected behavior

The tokens should save correctly and the %md5% token should save post's actual md5 sum instead of its filename on the server.

System information

OS: Ubuntu 18 (AMD64)
Grabber version: 6.0.6

Bionus commented 5 years ago

I can't reproduce this with the XML API enabled. Did you enable it and put it above the HTML one in the source settings? (that's the default)

booru.org sources, such as furry.booru.org you mention, use Gelbooru 0.2. And when using this source, as long as the XML API is enabled, it will properly fetch all of the information you mention (%date%, %filename%, %md5%, %source%, %height% and %width%).

As for the MD5, that's by design, if the website provides one, Grabber does not re-calculate it.

Bionus commented 5 years ago

Could it also be that you added this source as a Gelbooru 0.1 board instead of Gelbooru 0.2?

Danimos552 commented 5 years ago

I've been experimenting with the settings you recommend on different boorus, but with varied results.

The settings do work, but only for few specific boorus (namely rule34.xxx, furry.booru.org and realbooru.com). Everything gets saved properly and there are no issues with %filename% / %md5% tokens either and the actual md5 is saved.

However, this does not work completely with other boorus. Neither with the most popular ones (like clop.booru.org, equi.booru.org, footfetishbooru.booru.org or meme.booru.org), nor niche (like kawaii.booru.org, min.booru.org, tower-girls.booru.org, which I chose at random). When you specify these manually as Gelbooru (0.2) then Grabber shows no results and if you let Grabber guess the booru type it will always choose Gelbooru (0.1), but then the issues I had mentioned above happen. I also noticed the %score% token doesn't work properly either and "0" is always saved instead of actual score.

So it appears that most of the boorus work under the Gelbooru (0.1) engine, which is bugged. And the %filename% / %md5% token issue renders the md5s.txt file useless (on these boorus) since it saves the filenames instead of actual md5s.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

If this issue is about a bug that still happens in the latest version, or a suggestion that is still relevant, feel free to comment on it and the maintainers will have another look, they might have missed it!

Thank you!

Bionus / imgbrd-grabber

Issues with parsing some tokens on booru.org subdomains #1553