Bionus / imgbrd-grabber

Very customizable imageboard/booru downloader with powerful filenaming features.
https://www.bionus.org/imgbrd-grabber/
Apache License 2.0
2.42k stars 212 forks source link

Improper links fetching #246

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 9 years ago

Hello everyone!

I am writing this, because I've got a problem with fetching links to websites. The problem is that various websites use different features and I cannot set up properly Grabber. For example a proper working link to some certain website may be as follows: "website.com/index.php?page=post&s=list", but Grabber keeps adding some additional tags, thus it creates such link: "website.com/index.php?page=post&s=list&limit=25&pid=0&tags=". And the point is that this (as well as many others) website doesn't support these tags, and thus it just doesn't even search. I'd like somehow to tell the Grabber not to use this additional tagging, but all my efforts have failed. I tried changing the options (especially "General" and "Sources" tabs, which I realized greatly affect links), as well as changing the sites referers, but it didn't work.

Other case is that Grabber may add an additional slash to the link. Website link's may be for example: "website.com/post/list/1", and Grabber's version is: "website.com/post/list//1". Note that there is a double slash at the end before 1!

Some other cases may concern not only adding some more tags at the end, but also adding some in the middle or at the begining.

Grabber also incorrectly guesses the site's type. For example it keeps recognizing it as Danbooru (2.0), but in fact this is a Gelbooru type site.

I think that this problem is of a high importance, because now there are only few corrently working sites in Grabber, the rest just somehow crashes.

If I only knew how could I possibly change this feature, I would easily work our any other case.

I also realized that after uptade some websites that worked on previous version, now don't work. This is that for example e621.net worked properly on version 3.2.7, but it doesn't want to search anything in version 3.3.0 and instead of grabbing images, it only shows some HTML code when clicked. Please help!

I tried running both 3.2.7 and 3.3.0 versions of Grabber on 32-bit Windows 7.

Original issue reported on code.google.com by vulox007 on 18 Jun 2013 at 9:37

GoogleCodeExporter commented 9 years ago
Hello,

First, sorry to answer late, I did not get any notification for this issue.

When you talk about Gelbooru links "website.com/index.php?page=post&s=list", 
actually adding tags such as "limit=25&pid=0&tags=", even if not required 
should not change the result. For example, "google.com/" and 
"google.com/?banana=pear" should give the same result, unless "google.com" 
actually look for "banana".
Same thing for "//" (Shimmie I guess), they should automatically redirect to 
"/".
I tihnk the problem of not recognizing does not comes from the URLs (maybe for 
some boards though, since I haven't updated some sources for a while).

As for all sources being recognized as Danbooru, my bad, an omission of mine in 
the last update. ><

If you want to change the links fetched by the program, they are located in the 
"model.xml" file of each source, located in "C:/Users/USERNAME/Grabber/sites", 
then in each subdirectory. There, you can edit out the variables you don't want 
(for example removing the "&limit={limit}" if you don't want it), but I don't 
think it will fix the problem. I think the problem is their API changed for 
most of them (Danbooru and Gelbooru), and some Regexes are out of date (they 
can also be found in the same file, in the "Regex" part). This file actually 
controls pretty much everything for each source (the source guessing for 
example, comes from its "Guess" part).

I started working on the next update, starting with making all the sources 
working.
You can find a fix for Gelbooru here ("Grabber.exe" goes in "C:/Program 
Files/Grabber", "model.xml" in "C:/Users/USERNAME/Grabber/sites/Gelbooru 
(2.0)"):
http://www.mediafire.com/?0cz1li595zr1349
http://www.mediafire.com/?62af58esnrwfk98
For e621, it's working fine for me, maybe I changed the model file and forgot 
it, so here is mine (for danbooru 1.0):
http://www.mediafire.com/?zgfcb2b18ieedx5

If you have any issue, don't hesitate to ask me :)

Original comment by bio.nus@hotmail.fr on 6 Jul 2013 at 6:10

GoogleCodeExporter commented 9 years ago
Thanks for your response and thanks for the fixes as well. It's good to know 
that you're working at the improvements.

What's moreover I can tell about another issue. e621 besides having standard 
categories such as "artist", "copyright", "character" and "general", has also 
one called "species". It's like Behoimi that also has its own called "model". 
It would be really nice if these could be included in the filename. They have 
also separated tags into sections, so now all "characters", "characters", etc 
are next to one another and they are no more sorted by their names. And since 
that update Grabber ceased to grab these tags. Now instead of "artists", 
"copyright" and "characets" it just gives default values to the filenames like 
"anonymous", "misc" and "unknown". I believe that this is an imporatnt issue, 
because the thing is new, and it actually helps a lot in navigating through the 
site and I guess that other sites might do it as well.

I think that i also figured out why Grabber doens't want to grab tag tokens 
from paheal. I believe it's because in paheal (and obviously othe Shimmie-based 
boorus) tags aren't devided into these categories, thus all of them are a kind 
of like "general" tags. And because of that, this function can probably never 
work on these sites. While I believe that there is a solution for grabbing the 
imges' IDs and maybe date.

If you cannot work out the solution for the filenames, don't worry - it's not 
the most important thing in the world. I just would like Grabber to be able to 
properly donwload images from any boorus, because besides the most popular 
ones, there are still others with interesting content that are worth checking.

Cheers and good luck!

Original comment by vulox007 on 14 Jul 2013 at 6:09