McSib / e621_downloader

E621 and E926 downloader made in the Rust programming langauge.
Apache License 2.0
59 stars 12 forks source link

Linux Compatibility #23

Closed lordkitsuna closed 4 years ago

lordkitsuna commented 5 years ago

Program seems to work pretty well on linux. Was able to just clone the repo and cargo run just fine. although i do seem to be having an issue with pools and it just kinda downloading at random once it grabs the things i asked for. For this test case i tried the test pool of ghost in my attic 2 and then general tag of smolder. It says it grabs the pool but i don't ever see it appear in the downloads folder. it then grabbed all of smolder just fine, however then it just kinda starts downloading the entire site for some reason, i compared with the site and it seems to just be grabbing from the main posts page in order.

[kitsuna@kitsuna-tablet e621_downloader]$ cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.57s Runningtarget/debug/e621_downloader` Should enter safe mode (Y/N)? n Parsed tag file. "The_Ghost_In_My_Attic_2" grabbed! "smolder" grabbed! "" grabbed!

Duplicate found: skipping... 178 / 178 [===================================================================================================] 100.00 % 1068.93/s
Downloading: 8 / 1280 [>--------------------------------------------------------------------------------------------------------------------] 0.62 % 0.56/s 38m `

lordkitsuna commented 5 years ago

Played a little more, still cant get pools to actually download. also it seems that general is somehow acting hard coded. If i comment out general to try and not use it i get Error: Error(Json(Error("missing fieldcreated_at", line: 1, column: 38))) this message goes away if i uncomment general

McSib commented 5 years ago

I would recommend grabbing the source off of one of the releases since the new commits will break a lot of the program. Also, off of reading the parser code (needed to refresh myself after my break), the groups are a very integral part of the tag file's syntax. Removing a group in the code can cause a few issues. Thanks for letting me know of this bug, I will look into and see what I can do!

Side note, it doesn't try to download the entire site, it actually is grabbing 5 pages off of no tags. I do find it fascinating, however, that it sees a whitespace character since those aren't allowed to be exposed to the parser (it will skip any whitespace it notices).

lordkitsuna commented 5 years ago

I don't know if its worth a separate issue but ii have also noticed that the parser does not like some characters such as the fancy "e" used in Pokemon tag. it will just stall trying to process it

McSib commented 5 years ago

This is because of how the parser works. If I remember correctly, it goes character by character, getting each char and forming tokens from them. When it grabs individual chars, it could be ignoring UTF, instead opting for ASCII. I will look into this when I have the time.

lordkitsuna commented 5 years ago

Thanks for always replying, ive noticed on some larger tags that it seems to not want to go past 1280 downloads. I saw in the e621 thread you mentioned huge tags with like one million posts being an issue. Was a limit added? or is there any way to get it to download more than 1280. isabelle_(animal_crossing) is a good example. Site says there are a little over 4k but the downloader wont do more than 1,280

McSib commented 5 years ago

When a tag passes the limit of 1,500 posts, it is considered too large a collection for the software to download. The program will opt to download only 5 pages worth of posts to compensate for this hard limit. The pages use the highest post limit the e621/e926 servers will allow, which is 320 posts per page. In total, it will grab 1,280 posts as its maximum.

Something to keep a note of, depending on the type of tag, the program will either ignore or use this limit. This is handled low-level by categorizing the tag into two sections: General and Special. General will force the program to use the 1,280 post limit. The tags that register under this flag are as such: General (this is basic tags, such as fur, smiling, open_mouth), Copyright (any form of media should always be considered too large to download in full), Species (since species are very close to general in terms of number of posts they can hold, it will be treated as such), and Character in special cases (when a character has greater than 1,500 posts tied to them, it will be considered a General tag to avoid longer wait times while downloading). Tags that register under the Special flag are as such: Artist (generally, if you are grabbing an artist's work directly, you plan to grab all their work for archiving purposes. Thus, it will always be considered Special), and Character (if the amount of posts tied to the character is below 1,500, it will be considered a Special tag and the program will download all posts with the character in it).

This system is more complex than what I have explained so far, but in a basic sense, this is how the downloading function works with tags directly. These checks and grabs happen with a tight-knit relationship that is carried with the parser and the downloader. The parser will help grab the number of posts and also categorize the tags to their correct spots while the downloader focuses on using these tag types to grab and download their posts correctly.

Hopefully, this explains how and why the limit is there.

McSib commented 5 years ago

I don't know if its worth a separate issue but ii have also noticed that the parser does not like some characters such as the fancy "e" used in Pokemon tag. it will just stall trying to process it

Off of testing the program, the issue was the tag validation system not checking for Unicode characters. It would break early and get caught in a loop because it couldn't validate the character. This is now fixed. I will have to add support for Chinese character and Hiragana, Katakana, and Kanji for Japanese respectively.

McSib commented 5 years ago

Program seems to work pretty well on linux. Was able to just clone the repo and cargo run just fine. although i do seem to be having an issue with pools and it just kinda downloading at random once it grabs the things i asked for. For this test case i tried the test pool of ghost in my attic 2 and then general tag of smolder. It says it grabs the pool but i don't ever see it appear in the downloads folder. it then grabbed all of smolder just fine, however then it just kinda starts downloading the entire site for some reason, i compared with the site and it seems to just be grabbing from the main posts page in order.

[kitsuna@kitsuna-tablet e621_downloader]$ cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.57s Runningtarget/debug/e621_downloader` Should enter safe mode (Y/N)? n Parsed tag file. "The_Ghost_In_My_Attic_2" grabbed! "smolder" grabbed! "" grabbed!

Duplicate found: skipping... 178 / 178 [===================================================================================================] 100.00 % 1068.93/s Downloading: 8 / 1280 [>--------------------------------------------------------------------------------------------------------------------] 0.62 % 0.56/s 38m `

I have looked into the issue and released a new version. Grab the repo and recompile, then tell me if you have any other issues. I also looked into the images not saving, and I think this may be a problem with how the files save to Linux. Since I'm using Windows, I can't properly test it, but if you can, please roam around in your Linux directories and see if there is a directory that isn't meant to be there. The folder is always called "download" unless renamed in the config file.

McSib commented 4 years ago

Hello, I'm getting back to you in hopes of knowing whether are not this issue has been resolved? 1.5.4 is now released, and I would hope that this problem has been fixed as a part of my optimizing and changes I have done to the code. If this is still a problem, get in contact with me, as I would be more than happy to send a version of my source code that can print the location of the images, if they happen to be writing somewhere they shouldn't be.

lordkitsuna commented 4 years ago

Sorry about the delay, newest version does properly save pools as well. However ultimately I've switched to other tools because the page limits present here make it very difficult to actually aquire any large series. For example silver soul (pool 11563) is a very large series and your downloader only grabs the first page managed to find a pool downloader that while clunky and honestly quite annoying to use at least gets the whole pool. But i did try out the new version with some things to test. No issues found. It won't run on Android atm but that appears to be an issue with rust not your problem termux is on 1.38 and the issue I'm getting with building seems to be solved on 1.39. If you ever find away around the total grab limits do please let me know as i greatly prefer this tool to anything else I've found since its easy to que what i want, fast, and runs on almost anything but that's not helpful when i can't get all of large pools. Unless I'm just misunderstanding how to download a large pool

McSib commented 4 years ago

Sorry about the delay, newest version does properly save pools as well. However ultimately I've switched to other tools because the page limits present here make it very difficult to actually aquire any large series. For example silver soul (pool 11563) is a very large series and your downloader only grabs the first page managed to find a pool downloader that while clunky and honestly quite annoying to use at least gets the whole pool. But i did try out the new version with some things to test. No issues found. It won't run on Android atm but that appears to be an issue with rust not your problem termux is on 1.38 and the issue I'm getting with building seems to be solved on 1.39. If you ever find away around the total grab limits do please let me know as i greatly prefer this tool to anything else I've found since its easy to que what i want, fast, and runs on almost anything but that's not helpful when i can't get all of large pools. Unless I'm just misunderstanding how to download a large pool

I'm terribly sorry! I didn't know the request for downloading pools actually worked with multiple pages when I added it. I don't know if they changed it, or I just happened to overlook it. Pools, sets, artists, and single posts are all supposed to be downloaded in its entirety with no conditions, so I was confused that you had this issue. Then I looked into the API again, and sure enough, there was my problem!

Capture

In this image, it shows that one of the parameters I can use is page. So on my surface check of your pool, I saw 23 posts for that first page:

Capture

Using the first ID from the first post on that page (1353324), I moved to page two and crossed checked, and sure enough, it was another page with the first post ID being different (1217396). I'm working on an update to fix this issue. So sorry about this!

McSib commented 4 years ago

I could have sworn that they showed all the posts tied to a pool at one point, but I might be mistaken with sets. Nonetheless, this issue will be fixed shortly. So sorry for the inconvenience!

lordkitsuna commented 4 years ago

Not your fault, thanks for the quick response. I'm also glad to hear it was just a simple api problem. I thought it was something similar to the 1,280 general tag limit

McSib commented 4 years ago

Not your fault, thanks for the quick response. I'm also glad to hear it was just a simple api problem. I thought it was something similar to the 1,280 general tag limit

No problem. The tag limit is only limiting the general tags, stuff like species, copyright, and similar tags will be limited because downloading 1,410,265 posts is just impossible for my program to pull off without something going wrong. Whether that be the huge amount of space needed for all those posts, or the servers kicking my software off, or the servers going down for any number of reasons. Do keep in mind though, if there is a special tag mixed with tags that are limited, the limit is removed as it is assumed the special tag will make the number of posts much more acceptable to download.

You should see a working version of my software that fits your needs shortly. This should be a quick fix. 😄

McSib commented 4 years ago

Okay, this issue has been fixed now on my side, but I'm going to do something else real quick. Since pools represent comics, I want to sort the files so that they are more organized when being downloaded. This is just something that has been nagging me for a bit now and I think it's time to add this feature.

McSib commented 4 years ago

1.5.6 is out, this should fix your issues. Hope you enjoy it!

And don't be afraid to ask me if there are any other issues. 😺

lordkitsuna commented 4 years ago

Working great! Can't wait for termux to update cargo so i can use it on Android again lol. What counts as special tags just btw?

McSib commented 4 years ago

Working great! Can't wait for termux to update cargo so i can use it on Android again lol. What counts as special tags just btw?

This is taken from the README.md:

Capture

General tags (where the limit applies) happens with these:

Capture

Now, something to note that this does not mention is that pools, sets, and single posts are all special tags. This means that they all will be downloaded in their entirety. This system is much more complex and the source for it can be read here, but overall, this is the basic understanding of what makes a tag special or not.

If you want specifics on what exactly sets the tag, the sources will be listed below:

This should cover everything to do with the nature of special and general tags. I hope this explains it well for you. 😺