Improper links fetching

Bionus / imgbrd-grabber

Very customizable imageboard/booru downloader with powerful filenaming features.

Apache License 2.0

2.42k stars 212 forks source link

Hello everyone!

I am writing this, because I've got a problem with fetching links to websites. The problem is that various websites use different features and I cannot set up properly Grabber. For example a proper working link to some certain website may be as follows: "website.com/index.php?page=post&s=list", but Grabber keeps adding some additional tags, thus it creates such link: "website.com/index.php?page=post&s=list&limit=25&pid=0&tags=". And the point is that this (as well as many others) website doesn't support these tags, and thus it just doesn't even search. I'd like somehow to tell the Grabber not to use this additional tagging, but all my efforts have failed. I tried changing the options (especially "General" and "Sources" tabs, which I realized greatly affect links), as well as changing the sites referers, but it didn't work.

Other case is that Grabber may add an additional slash to the link. Website link's may be for example: "website.com/post/list/1", and Grabber's version is: "website.com/post/list//1". Note that there is a double slash at the end before 1!

Some other cases may concern not only adding some more tags at the end, but also adding some in the middle or at the begining.

Grabber also incorrectly guesses the site's type. For example it keeps recognizing it as Danbooru (2.0), but in fact this is a Gelbooru type site.

I think that this problem is of a high importance, because now there are only few corrently working sites in Grabber, the rest just somehow crashes.

If I only knew how could I possibly change this feature, I would easily work our any other case.

I also realized that after uptade some websites that worked on previous version, now don't work. This is that for example e621.net worked properly on version 3.2.7, but it doesn't want to search anything in version 3.3.0 and instead of grabbing images, it only shows some HTML code when clicked. Please help!

I tried running both 3.2.7 and 3.3.0 versions of Grabber on 32-bit Windows 7.

Original issue reported on code.google.com by vulox007 on 18 Jun 2013 at 9:37

Hello, First, sorry to answer late, I did not get any notification for this issue. When you talk about Gelbooru links "website.com/index.php?page=post&s=list", actually adding tags such as "limit=25&pid=0&tags=", even if not required should not change the result. For example, "google.com/" and "google.com/?banana=pear" should give the same result, unless "google.com" actually look for "banana". Same thing for "//" (Shimmie I guess), they should automatically redirect to "/". I tihnk the problem of not recognizing does not comes from the URLs (maybe for some boards though, since I haven't updated some sources for a while). As for all sources being recognized as Danbooru, my bad, an omission of mine in the last update. >< If you want to change the links fetched by the program, they are located in the "model.xml" file of each source, located in "C:/Users/USERNAME/Grabber/sites", then in each subdirectory. There, you can edit out the variables you don't want (for example removing the "&limit={limit}" if you don't want it), but I don't think it will fix the problem. I think the problem is their API changed for most of them (Danbooru and Gelbooru), and some Regexes are out of date (they can also be found in the same file, in the "Regex" part). This file actually controls pretty much everything for each source (the source guessing for example, comes from its "Guess" part). I started working on the next update, starting with making all the sources working. You can find a fix for Gelbooru here ("Grabber.exe" goes in "C:/Program Files/Grabber", "model.xml" in "C:/Users/USERNAME/Grabber/sites/Gelbooru (2.0)"): http://www.mediafire.com/?0cz1li595zr1349 http://www.mediafire.com/?62af58esnrwfk98 For e621, it's working fine for me, maybe I changed the model file and forgot it, so here is mine (for danbooru 1.0): http://www.mediafire.com/?zgfcb2b18ieedx5 If you have any issue, don't hesitate to ask me :)

Thanks for your response and thanks for the fixes as well. It's good to know that you're working at the improvements. What's moreover I can tell about another issue. e621 besides having standard categories such as "artist", "copyright", "character" and "general", has also one called "species". It's like Behoimi that also has its own called "model". It would be really nice if these could be included in the filename. They have also separated tags into sections, so now all "characters", "characters", etc are next to one another and they are no more sorted by their names. And since that update Grabber ceased to grab these tags. Now instead of "artists", "copyright" and "characets" it just gives default values to the filenames like "anonymous", "misc" and "unknown". I believe that this is an imporatnt issue, because the thing is new, and it actually helps a lot in navigating through the site and I guess that other sites might do it as well. I think that i also figured out why Grabber doens't want to grab tag tokens from paheal. I believe it's because in paheal (and obviously othe Shimmie-based boorus) tags aren't devided into these categories, thus all of them are a kind of like "general" tags. And because of that, this function can probably never work on these sites. While I believe that there is a solution for grabbing the imges' IDs and maybe date. If you cannot work out the solution for the filenames, don't worry - it's not the most important thing in the world. I just would like Grabber to be able to properly donwload images from any boorus, because besides the most popular ones, there are still others with interesting content that are worth checking. Cheers and good luck!

Bionus / imgbrd-grabber

Improper links fetching #246