McSib / e621_downloader

E621 and E926 downloader made in the Rust programming langauge.
Apache License 2.0
59 stars 12 forks source link

Using non-number characters in ID-only groups BSODs the computer #97

Closed Dibucci closed 1 year ago

Dibucci commented 1 year ago

Describe the bug When I try to download a large set from e621 it freezes my PC. Thinking it was just frozen while it processes the large set, I wait only for it to bluescreen

I have a Windows 11 home edition Alienware Ryzon R10 Nvida RTX 2060 SUPER 8 core cpu (can't check the model at this time) 16Gig ram Graphics card is not overclocked, it can be tho but I won't for lack of knowledge and money to replace it if I mess it up XD Using the current version available as of this post

To Reproduce Steps to reproduce the behavior:

  1. Open your downloader
  2. Do all the setup files, I logged in cuz I just have an account on a throw away email
  3. Put a large set, I used the highest first for it to happen and 2nd highest post set I'm trying as I'm posting this on phone, under the [sets] tag and everything else blank
  4. Start the program, set safe mode to no, cuz idk what it does, and PC freezes completely in a few seconds and crash\bluescreen about 30 min or longer

Expected behavior Thought, since it's downloading posts, that it should just download, just very slowly

McSib commented 1 year ago

Normally with software that results in BSOD, it is more the software is a trigger for the actual problem that is occurring. The GPU shouldn't be causing issues as my program doesn't have any code which functionally uses it.

However, the CPU and memory is used quite heavily, like with any program. I would suggest opening Event Viewer in Windows 11 and looking for critical errors within the administrative log. Ignore anything including Diagnostic as there is a present bug with it right now in Windows. If you see any errors reporting WHEA, this is a hardware issue that isn't tied with my software, my software may just be doing something to show the malfunction on the hardware.

Having dealt with plenty of BSOD in my time due to a present hardware issue I've been dealing with for 2 years, I understand the frustration of such crashes. Reach back to me regarding anything you find,

McSib.

Dibucci commented 1 year ago

Normally with software that results in BSOD, it is more the software is a trigger for the actual problem that is occurring. The GPU shouldn't be causing issues as my program doesn't have any code which functionally uses it.

However, the CPU and memory is used quite heavily, like with any program. I would suggest opening Event Viewer in Windows 11 and looking for critical errors within the administrative log. Ignore anything including Diagnostic as there is a present bug with it right now in Windows. If you see any errors reporting WHEA, this is a hardware issue that isn't tied with my software, my software may just be doing something to show the malfunction on the hardware.

Having dealt with plenty of BSOD in my time due to a present hardware issue I've been dealing with for 2 years, I understand the frustration of such crashes. Reach back to me regarding anything you find,

McSib.

unfortunately, i don't know much on software or hardware of computers, let alone even basic coding. IDK how to even use Event Viewer cuz I have no experience in software programing or engineering, nor did I even know that was even a thing up till this moment. Yes, I could just look it up on the internet on how to use it, but if it turns out to be a hardware thing as you said, and knowing my luck it is since Dell uses 3rd rate crap for the alienware's internals, I don't have the money, or mental or physical patience with my ADHD and anger management issues, to do anything about it if that is the case for this.

I will say that it works for all the other prompts, well, at least the ones I've tried. General works and it, as it says on the main page, limits it to the set max, and artist works, but looking for artists in the artists section of the website brings up tags and not artists half the time. I haven't tested Pool's yet, cuz im looking for images and not comic pages if I can help it, and the set is the one I tried to do that made it crash

Oh, one thing I should have pointed out, and idk if maybe this could also be a cause or not but, the first set I tried had this many Posts in it : 1121090

So, It might of been too much for it or my pc idk.

I know you said that for "special" tags that it forgoes the post limit, and i'm mainly trying to get a large collection of Images to try my had on making a Stable Diffusion model as a test.

I have a major learning curve and trying to figure out other things, weather it's this program or another, will just drive me mad. I only have the capacity to work on one thing, and the thing I'm trying to learn is making a model for stable diffusion.

I will try a few other things and test to see if I could get a bigger amount with doing an artist with a large colection, but if it crashes again, and no offense, I'll be looking for something else to save me what little sanity I have left.

got to change my name on here as well cuz I go by something else online for the most part. Anyhow, I'm sorry I can't be of much help. If you want to keep this open in case anyone else has the same issue, and unlike me the know how to do things beyond just fiddling around with a program or game like I do, you can but if not that's ok too. I wish you the best on your project, in case i fall off the grid like I do with most things i try cuz I'm a scatter brain, and wish you the best of luck in your future endeavors.

 May The Odds Be Ever In Your Favor,
 Tundra
McSib commented 1 year ago

With over 1 million posts (if I read the number right), do you have enough harddrive space for it? I know just grabbing a few thousand can be over 22 GB in my case. Maybe my software is tripping something in Windows related to storage.

McSib commented 1 year ago

Hang on, I may know the issue! Since you are grabbing so many posts in one set, your memory is overflowing and crashing the system. The downloader will grab all post with all metadata before it shrinks it down into an easier data structure. Because of this, your memory may be maxing out when it starts because there isn't enough memory for the grabbing process to finish, thus resulting in blue screen.

McSib commented 1 year ago

I never included any check for system memory, so the program doesn't know how much memory you have on your system, it just assumes you have enough from the get go.

Dibucci commented 1 year ago

Hi, came back because I think It's only the [sets] setting. All other settings work fine but the moment I do Sets, just did one that only had like 100 in it, and my pc started to crash, took all my strength to stop it fast when audio and everything started going

Artists, general do fine, havent tried single-post or pools yet, but its only sets start to crash the whole pc

idk if this will have anything you can use but here is the e621_downloader.log e621_downloader.log

I can still use it, just will be slow without something like sets for a good pool of what I'm looking for xD would try pools but they arn't that big and most are "comic's"

Dibucci commented 1 year ago

I never included any check for system memory, so the program doesn't know how much memory you have on your system, it just assumes you have enough from the get go.

not saying you should, or have to but, adding a Ram check sometime in the future would be good for people like me who don't know how to fix the problem on their own in case a ram issue ever does come up

spbmnn commented 1 year ago

Think I've sniffed out the problem here. Was getting a similar issue with the attached tags.txt which was specifically written to grab a small number of files - the set in question has only one image. Commenting it out lets the program run as usual, leaving it in results in a memory leak.

Looks like there's something up with the way it handles sets in particular.

EDIT: After a bit more testing, it seems it's only caused if you use a set's shortname instead of the ID, which is an easy mistake to make, as the site itself brings you to set contents with the search syntax set:<shortname>.

spbmnn commented 1 year ago

Okay, after digging through the source code, I'm starting to get an idea of how this bug happens. When the parser finds what should be a set, it checks if it's a valid numeric ID, even though theoretically a set could be linked to by an ID or searched for with a shortname. This runs consume_while looking for ASCII numerals, but somehow, even though it returns false, it keeps looping, taking in empty chars and converting them into strings,* over and over, until RAM is filled up and things go kablooey.

Theoretically this could mean the same would happen if you had letters in a single-post or a pool (so it should throw an error when it can't read a number from letters), however, people are far more likely to put in the "wrong input" for a set.

* On my machine, it looks like an empty String uses 24 bytes. Considering it's making a new one each time in an infinite loop, that's how you get blue screens.

McSib commented 1 year ago

Okay, time for me to tackle this issue and see what's up with it.

McSib commented 1 year ago

Okay, after digging through the source code, I'm starting to get an idea of how this bug happens. When the parser finds what should be a set, it checks if it's a valid numeric ID, even though theoretically a set could be linked to by an ID or searched for with a shortname. This runs consume_while looking for ASCII numerals, but somehow, even though it returns false, it keeps looping, taking in empty chars and converting them into strings,* over and over, until RAM is filled up and things go kablooey.

Theoretically this could mean the same would happen if you had letters in a single-post or a pool (so it should throw an error when it can't read a number from letters), however, people are far more likely to put in the "wrong input" for a set.

  • On my machine, it looks like an empty String uses 24 bytes. Considering it's making a new one each time in an infinite loop, that's how you get blue screens.

Read through both of your replies and firstly, I'm surprised you were able to locate the issue and sort through all my source code. I feel my codebase right now is pretty dirty and needs some cleaning, I remember the parser being better about it, but not by much 😅

Secondly, I'm going to set up a test environment and go through it, seeing if I can emulate this error and go from there. I think what may be happening is this:

This will basically go on forever until the program crashes. It is another logic error within my parser amongst many that probably exist but haven't been found yet. I've been considering implementing a full and properly supported lexer like Pest, maybe I should, since it would handle much of the parsing side of things while allowing me to handle the more logical side of it. I'd have to consider it. For now, though, I will see if I can get this issue fixed and working smoothly. This isn't a bug I normally get every day. Definitely confused me months back when it first occurred 😅

McSib commented 1 year ago

Okay, so I figured out the issue and got it fixed. It's what @spbmnn talked about. After confirming it myself, I went through and made it to where the program will panic and crash if someone tries using ascii outside of a comment inside of an ID-only group. This is most definitely a temporary fix for now. I'm going to take a look into implementing (building or finding a crate) that lexes the file and handles the parsing. From there, I'll be able to build a much more sound system that will not allow these issues nearly as much. It will also let me finally decouple my code some and make it more modular, the way it should've been. This parser is a rather old part of the codebase, and it is time for me to properly update it. Thank you for bringing this to light, and thanks @spbmnn for helping me towards figuring this all out. 😄