AlphaSlayer1964 / kemono-dl

A simple kemono.party downloader using python.
503 stars 81 forks source link

(not solved)An issue where the post did not finish downloading but not marked as failed #15

Closed CODE-LA-LI-LU-LE-LO closed 2 years ago

CODE-LA-LI-LU-LE-LO commented 2 years ago

image

service: fanbox user_id: 49906039 post_id: 2245817 I have confirmed that the problem still occurs after the update. The issue was found in multiple posts. (The timestamp was added by me to check the log) Also, there is a problem with the progress bar not showing when downloading the pdf file.

AlphaSlayer1964 commented 2 years ago

So multiple weird things seem to be happening with your version of the downloader. When I downloaded this post I got a loading bar for all the pdf's. At first I thought it was because the files had no file size in the header, it won't print the bar if that's the case, but they all showed up for me. Next I don't know why that last file has a [000] in front as that is the single post "file" and doesn't get index, only attachments do.

Concerning the main problem of the file not completing but not being logged as an error I honestly have no idea. I'm not able to reproduce this issue even with the same post, the file downloaded fine for me. This makes it really had for me to test what's going on. I do have an error exception handling during the download so if it just stops it should be triggering that. If you try downloading from kemono.party directly though your browser do you get any failed downloads?

I will be releasing an new version soon and you could try updating to that but I did not change and downloading logic so it probably won't help.

This was my output:

python "kemono-dl.py" --cookies "kemono.party_cookies.txt" --links https://kemono.party/fanbox/user/49906039/post/2245817 --force-indexing
Downloading post: ドSなレズカップルJKが催眠で分からせられる(1000円以上プランver.)一括ダウンロード用      
service: [fanbox] user_id: [49906039] post_id: [2245817]
Downloading: [1]_文字あり.zip
[==================================================] 114.3/114.3 MB, 13.7 Mbps
Downloading: [2]_パイパン差分.zip
[==================================================] 114.3/114.3 MB, 17.7 Mbps
Downloading: [3]_文字なし.zip
[==================================================] 95.5/95.5 MB, 14.4 Mbpss
Downloading: [4]_文字あり.pdf
[==================================================] 126.4/126.0 MB, 13.8 Mbps
Downloading: [5]_パイパン差分.pdf
[==================================================] 126.2/125.9 MB, 16.7 Mbps
Downloading: [6]_文字なし.pdf
[==================================================] 120.3/119.8 MB, 14.5 Mbps
Downloading: [7]_001-007.zip
[==================================================] 215.4/215.4 MB, 3.9 Mbps
Downloading: [8]_008-014.zip
[==================================================] 174.4/174.4 MB, 17.2 Mbps
Downloading: [9]_015-020.zip
[==================================================] 171.9/171.9 MB, 20.1 Mbps
Downloading: 487b69c2-26f6-4284-8dfc-0b74197eecbe.jpe
[==================================================] 0.2/0.2 MB, 11.1 Mbps
Saving content to content.html
Saving comments to comments.html
Completed downloading post: ドSなレズカップルJKが催眠で分からせられる(1000円以上プランver.)一括ダウンード用
----------------------------------------------------------------------------------------------------
Done!
CODE-LA-LI-LU-LE-LO commented 2 years ago

After downloading the latest release, as a result of checking, the progress bar is displayed normally in the pdf, but the data size is not displayed properly. But this seems to be my personal problem. (When downloading pdf from kemono in Chrome, it seems that the total data size is not visible)

The problem of interruption during data download seems to be interrupted as the transfer speed drops when the API is used excessively. It's a rare occurrence, so it's hard for me to reproduce the problem.

Below are the results of running with the latest release.

python kemono-dl.py --cookies "cookie.txt" --ignore-errors --archive archive.txt --force-indexing --links https://kemono.party/fanbox/user/49906039/post/2245817
Downloading post: ドSなレズカップルJKが催眠で分からせられる(1000円以上プランver.)一括ダウンロード用
service: [fanbox] user_id: [49906039] post_id: [2245817]
Downloading: [1]_文字あり.zip
[==================================================] 114.3/114.3 MB, 38.1 Mbps
Downloading: [2]_パイパン差分.zip
[==================================================] 114.3/114.3 MB, 47.7 Mbps
Downloading: [3]_文字なし.zip
[==================================================] 95.5/95.5 MB, 13.2 Mbps
Downloading: [4]_文字あり.pdf
[==================================================] 0.0/??? MB, 0.0 Mbps
Downloading: [5]_パイパン差分.pdf
[==================================================] 0.0/??? MB, 0.0 MbpsError downloading: https://kemono.party/data/attachments/fanbox/49906039/2245817/パイパン差分.pdf
("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
Downloading: [6]_文字なし.pdf
[==================================================] 0.0/??? MB, 0.0 Mbps
Downloading: [7]_001-007.zip
[==================================================] 215.4/215.4 MB, 5.0 Mbps
Downloading: [8]_008-014.zip
[==================================================] 174.4/174.4 MB, 17.6 Mbps
Downloading: [9]_015-020.zip
[==================================================] 171.9/171.9 MB, 2.8 Mbps
Downloading: 487b69c2-26f6-4284-8dfc-0b74197eecbe.jpe
[==================================================] 0.2/0.2 MB, 4.0 Mbps
Saving content to content.html
Saving comments to comments.html
1 Error(s) encountered downloading post: ドSなレズカップルJKが催眠で分からせられる(1000円以上プランver.)一括ダウ ンロード用
----------------------------------------------------------------------------------------------------
Done!
AlphaSlayer1964 commented 2 years ago

It's also very weird that you are not getting the correct header information for the pdf files. The ??? displays when the file size can't be found. I will still be looking into this. I also noticed that kemono has been giving 429 errors for excessive downloads though I have only experienced this on a browser and not with the downloader.

albertobalsalm commented 2 years ago

I also experienced this problem downloading fanbox artist user id 1549613.

Some files didn't download completely but no error was thrown, and the file size looked correctly in the console log. It just skipped the file and moved to download the next one without notifying the error. I just found out about this program, not sure if that's the expected behavior, but I wish it would retry downloading files if it fails. And download 1 file at a time. My connection is not the best, might be something related to that.

I'm using the latest version.

albertobalsalm commented 2 years ago

Also, is there a way/option for the program to check if a file is already downloaded and skip it? Because when I stopped the program and resumed it, it redownloaded everything.

AlphaSlayer1964 commented 2 years ago

Also, is there a way/option for the program to check if a file is already downloaded and skip it? Because when I stopped the program and resumed it, it redownloaded everything.

If you use --archive it will log the ones that completed with no errors.

For the partial download issue I am still trying to figure out why this happens. I believe with this newest release 2021.10.27 I have it properly throw an error when this occurs. The problem is I have no idea why it does this and the requests module does not seem to throw an exception when this happens. I currently think it happens because the site only keeps a small amount of files in it's cache and when you request a large file that's not in cache it takes so long that the connection is timing out. But that is just a guess.

albertobalsalm commented 2 years ago

Thanks for the reply. I'm using the --archive FILE argument now. However, it seems it only works for logging posts, and not individual files. Any chance such an option could be implemented in the future so it will check each file downloaded?

There's this other GUI-based software called Kemono Downloader, and although downloads are prone to fail with it too, at least it skips the already downloaded files when the download fails and I have to restart the download (however, I have to always manually delete the corrupted files, or else it will register them as already downloaded.) I think maybe implementing a file size check might help making sure the files have been correctly downloaded.

AlphaSlayer1964 commented 2 years ago

Logging every file should be possible but I also see many complications with doing this. At the very least it would require the program to try to re download the file again and compare them. I also just realized that I need the script to delete the file if it fails so partial files aren't just sitting there.

AlphaSlayer1964 commented 2 years ago

Apparently the base URL has the sha-256 hash of the file as the file name but the problem is I noticed with some testing that some of the hashes that they have as file names seem to be wrong. I'm asking on their forum about it. If that issue can be figured out then that will be an easy way of checking if a file was already downloaded and if it was downloaded correctly. Though you will need to be consistent with using --force-index or not. Also another problem could come if a post is updated and indexing order changes.

albertobalsalm commented 2 years ago

Ok. I've been downloading a creator, and when the download fails, it will retry the last file it attempted to download. But if the download stops completely and I have to restart, it will re-download the whole post again. But this works for me. Thanks a lot for your continuous effort!

AlphaSlayer1964 commented 2 years ago

So I'm working on using the sha-256 hash in the file name to stop duplicate downloads and make sure file was downloaded correctly but I have noticed some of their hashes are wrong (1/10,000 maybe). I have my script being tested with 25 accounts and have only noticed like 3 files that had incorrect hashes.