AlexAplin / nndownload

Download and process links from Niconico (nicovideo.jp)
MIT License
213 stars 28 forks source link

Multi-thread downloading often stuck at the end #79

Closed fireattack closed 3 years ago

fireattack commented 3 years ago
import nndownload

nn_args =  ['https://www.nicovideo.jp/watch/so28966396']
nn_args.extend([
    '-g',
    '-o', '{id}.{ext}',
    '-r', '3'
])
nndownload.execute(*nn_args)
Proceeding with no login. Some videos may not be available for download or may only be available in a lower quality. For access to all videos, please provide a login with --username/--password or --netrc.
Performing initial API request...
Performed initial API request.
Downloading so28966396 to "so28966396.mp4"...
Multithreading is experimental and will overwrite any existing files.
|######################## | 99/100 @ 550.99KB /s

And it was stuck here for at least 5 min (until I shut it down manually).

I tried it with logging in (with a premium account if it matters), same result.

fireattack commented 3 years ago

OK, just tested a few more times; sometimes it's only stuck for a few seconds. I honestly have no idea what's the factor here.

And I tested the file when it's stuck: the file is already complete (correct hash).

AlexAplin commented 3 years ago

I'll stress test it to see if I can reproduce it.

AlexAplin commented 3 years ago

All download threads were being joined on the main thread, which causes the hanging on cleanup. The download is finished in show_multithread_progress so joining only that thread fixes the issue.

fireattack commented 3 years ago

Thanks!

fireattack commented 3 years ago

For whatever reason, this still happens from time to time with 1.8.2 :/

It happens more often when your download speed is fast (in my case, 2MB/s ).

AlexAplin commented 3 years ago

I can't reproduce the issue controlling for things like video type, number of threads, SSD/HDD, but I'll keep looking. What platform and kind of setup do you use?

fireattack commented 3 years ago

I use Windows, py 3.8.7, with my premium account, r=5.

I wrote a script similar to the above and call nndownload as module to batch download videos.

I'm currently downloading all these videos (one at a time) from https://www.nicovideo.jp/search/%E3%81%8D%E3%82%89%E3%81%8D%E3%83%A9%E3%82%B8%E3%82%AA?page=1&sort=f&order=a and it has been stuck thrice at the end of

https://www.nicovideo.jp/watch/sm22018297 https://www.nicovideo.jp/watch/sm22824676 https://www.nicovideo.jp/watch/sm23871857

(I don't think the actual video URL matters too much though, since when I re-download they just work.)

Not sure if it's relevant, but when I use VS Code debugger to run it (just to clarify, the bug can happen with or without debugging), it sometimes has more than 6 threads (sometimes even more than 10), not sure if it's normal or not.

image

fireattack commented 3 years ago

Ok, it's stuck again; it has a main and a constant thread-35 alive, while the last two threads keep getting re-creating:

image image

Not sure if this would give you any clues (VS Code debugger is pretty weak with multi-threading so I can't actually pause them and examine.)

fireattack commented 3 years ago

Ok. I added some info to output:

It looks like there is off-by-one issue somewhere:

image

As you can see, all the 5 downloading threads are finished (0 to 4), but the len is off by one, which is why it's stuck.

fireattack commented 3 years ago

I checked the math of the calculation of the progress. There seems to be some inherent off-by-one issue in calculation. Details below.

Let's say we have a file with 101 bytes (video_len=101), bytes[0] to bytes[100].

part = ceil(101/5)=21 T0: start = 0, end = 21 T1: start =21, end =41 T2: start =42, end =63 T3: start =63, end = 84 T4: start =84, end = video_len = 101.

So for T0, header has "Range": "bytes=0-21". Since Range header is inclusive [1], it will try to download bytes[0] to bytes[21], 22 in total. The code adds len(block) into the total, and then minus 1, so it ends up adding 22-1 = 21 to the total tally, despite downloaded 22 bytes.

For T1, same story, we download bytes[21] to bytes[41], 22 in total. bytes[21] is downloaded twice, but that's fine. The total tally again is increased by 22 then -1, so +21.

T2, +21, T3, +21.

The problem is the last thread T4. It would have the Range bytes[84] to bytes[101] (18 bytes), but there is no bytes[101]. So the server can only give you bytes[84] to bytes[100], which is 17 bytes. And since our tally is increased by the actual block size we get, we will increase it by 17 (not 18!) and then -1, which is 16.

Now... 21+21+21+21+16=100. But our video_len = 101! So it will never finish.

It was actually funny that it can sometimes finish. The reason likely is, since the -1 part is a separate call, the number would be correct for a split of second, which is enough for the progress thread to catch progress >= video_len.

And we can test it pretty easily: add a time.sleep(1) in the while loop of show_multithread_progress(). And it confirms my theory: now it's stuck at 99% every time.

AlexAplin commented 3 years ago

That's great work, thank you for hunting that down! I had incorrectly assumed null bytes but it was really because of 0-indexing all along, d'oh.