9-FS / nhentai_archivist

downloads hentai from nhentai.net and converts to CBZ
MIT License
92 stars 4 forks source link

Please add functionality to skip previously downloaded files #15

Closed Parshanttt closed 5 days ago

Parshanttt commented 5 days ago

Please create functionality to skip already downloaded files I am not talking about the duplicates posted in nhentai like if I download a manga and then delete it and again try to run the program it downloads it again and that is very important because I read o Reddit there nhentai collection is like 9 TB and not many people have that i kind of space free. You guys can implement this by letting the program record all the link that it downloaded in a .txt file and next time to skip it if the link is present in the .txt, like they have implemented this in yt-dlp.

9-FS commented 5 days ago

Hello, nHentai Archivist already checks prior every download if there already is a file at the resulting filepath and if so, skips the download. As long as you don't move things around manually in LIBRARY_PATH, it will detect it. If you manually delete stuff from your library and then tell nHentai Archivist to download them again, it will of course redownload it... This is desired behaviour.

If you don't want to download certain entries, I suggest you remove those ID from the downloadme.txt before downloading. Preferrably use the tag exclusion feature if you can. If this is about blacklisting individual works, at the moment have a python script or similar filter blacklisted ID out.

I'm not going to implement your proposed functionality, because that's just janky. The only thing I would be willing to consider is a blacklist feature.

billsargent commented 5 days ago

A reason I could see this beneficial would be if you download and then move to a komga server on another system which is what I do. My NAS runs komga, but I download on my raspberry pi. An sqlite db of already downloaded IDs would allow that. It could be used to skip duplicates later if someone changes the tags. If someone decides they don't need the db of lready downloaded tags, they can just delete the db. I do think comparing against files in the download directory is important though too.

Having said this, an smb mount to my nas would allow it to download right into komga. But I wasn't sure how well this nhentai archivist thing would work. It seems to do a pretty damn good job.

9-FS commented 5 days ago

Having said this, an smb mount to my nas would allow it to download right into komga.

This is exactly what should be done and why I created the LIBRARY_PATH variable.

billsargent commented 5 days ago

That is probably what I will end up doing. But I did want to throw out the sqlite idea if there was any reason it could be useful some how but after written that, I can't think of one. I answered my own question basically ;)

Parshanttt commented 5 days ago

I want to archive the whole site, but I don't have that much space on my laptop I am sure any laptop doesn't, don't you think it makes more sense to it record all the downloaded manga links to store in something like a .txt file and even if I move it from the LIBRARY_PATH it still skips those on which are recorded it that .txt file that way someone with 500GB storage o there system can also archive the whole site and this functionality is not lanky almost all scrapping or archiving tools have it like yt-dlp cyberdrop-dl galerry-dl

billsargent commented 5 days ago

I want to archive the whole site, but I don't have that much space on my laptop I am sure any laptop doesn't, don't you think it makes more sense to it record all the downloaded manga links to store in something like a .txt file and even if I move it from the LIBRARY_PATH it still skips those on which are recorded it that .txt file that way someone with 500GB storage o there system can also archive the whole site and this functionality is not lanky almost all scrapping or archiving tools have it like yt-dlp cyberdrop-dl galerry-dl

It's janky in the sense that reading through a massive text file for each download would be horrendously slow. Text files are not random access. IF something like this were to be implemented, it would need to be done at a database level. But if you really want to download the entire site, I would wait, because there are people out there who are actively working on archiving the entire site and deduplicating it and they will likely create a torrent out of it. I personally don't want everything there.

9-FS commented 5 days ago

Why would you even want to "archive the whole site" just to delete everything again to stay within 500 GB? It would make way more sense to move your files to hardware that can in fact archive the whole site. And then, as billsargent has pointed out, why don't you just connect to that hardware per SMB and have LIBRARY_PATH point at that?

I want to archive the whole site

with 500GB storage o there system

Yeah dude, just no.

Parshanttt commented 5 days ago

I am gonna store all this in a hdd drive and running it all day by downloading directly in that is not a great idea

9-FS commented 5 days ago

I don't have the feeling you are reading anything we are telling you. I am not willing to entertain this idea any further.