Choonster / twitch_vod_fetch

[DISCONTINUED - Use Choonster/fgtk instead] A Python script to download Twtich VoDs
Do What The F*ck You Want To Public License
1 stars 0 forks source link

Downloads not working in Windows #1

Closed Choonster closed 7 years ago

Choonster commented 8 years ago

Something in e726ac65a765ecf846e9082f99891663d8d4759d or 1a7712c12fd0743a8bc1317b98aae6b47c31ee45 broke downloads in Windows. Linux (Ubuntu) still works without issue.

aria is downloading the chunks successfully and writing their gids to the gids file, but twitch_vod_fetch thinks they failed and keeps trying to download them until it reaches the maximum number of retries.

mk-fg commented 7 years ago

Hey,

As you seem to be maintaining script version here, wanted to mention that I completely rewritten it in mk-fg/fgtk repo... which probably sounds bad, I know.

But it's much faster and simplier now (even though longer because of boilerplate like AsyncExitStack implementation), especially wrt how it works. Instead of crazy nested polling loops, million of clutter chunk-files and .sh/.bat hook-scripts, listens for onDownloadComplete (and -Error) events on websocket and concatenates downloaded chuks to dst file in sequence, removing them right after that and only tracking last concatenated gid (instead of all downloaded gids along with the files). I.e. whole "main loop" is in these ~50 lines: https://github.com/mk-fg/fgtk/blob/master/desktop/media/twitch_vod_fetch#L684-L743 It's also python-3.6+, using aiohttp module for http instead of requests.

As a bonus, pretty sure it might "just work" on windows, as there's no .sh hooks anymore or any subprocesses except aria2c, and that's started without any special params (close_fds is handled implicitly by python-3.x), so maybe don't need any extra patches now.

No idea if it's relevant to this fork though, as older script probably works just as well, and only big feature here is that there's less tempfiles around (and few that are left are kept in directory by default), plus maybe py3 (if keeping py2 around is a burden by now).

Apologies for hijacking the bug, but it might actually address it as well, if it's still there.

Choonster commented 7 years ago

Thanks for letting me know. I'll take a look at the new version later.

If it "just works" on Windows, that would be a lot easier for me.

Choonster commented 7 years ago

@mk-fg I've made a few small changes to the new version to allow it to run on Windows, but I'm running into an issue where aria2c is being stopped after 3 seconds and deleting the chunk files fails because they're in use by another process.

Do you know why this would happen or how to fix it?

You can see the code I'm running here and the output here.

I'm running the script with this command:

py C:\Users\USER\Documents\GitHub\fgtk\desktop\media\twitch_vod_fetch --debug https://www.twitch.tv/videos/148136907 "../download_temp/Dave" > log.txt 2>&1

Let me know if you want me to open a proper issue for this.

mk-fg commented 7 years ago

I'm running into an issue where aria2c is being stopped after 3 seconds and deleting the chunk files fails because they're in use by another process.

This is one issue, as that's how exit from aria2c works for some reason, i.e. this is what I think happens there:

Not sure why exactly aria2c needs to wait these 3 seconds or how to disable it, just how it works, and I've tried "aria2.forceShutdown" to same effect ¯\_(ツ)_/¯ It doesn't wait when receiving SIGTERM, but then it also warns that it's an emergency exit, so looks like a bad alternative to that relatively harmless delay.

Note on fsync/fdatasync - not actually sure why I've even put it there, shouldn't be needed, will remove. (using any previous "pos + seq" mark is fine if latest one will get lost - thing will just re-download few more chunks)

And the actual issue - os.unlink() failing on windows for opened files, I think is totally windows-specific, as you iirc can't remove opened files there, while on linux it's totally fine and is used as a "good practice" for tmp files and such (file is always accessible through fd, name on VFS is just a "link" to it).

Will need to google wrt how to fix it - iirc there are tools on windows like "unlocker", maybe it's just some trivial os.release_this_file() syscall wrapper...

Also, EX_OK is always 0 so no reason to bother with constant (kinda weird that py doesn't provide fallback value on win32 though) and easy to put an if around signal handlers indeed. Didn't know about ProactorEventLoop either, wonder why it's not picked on windows by default, though guess there's gotta be some good reason, will copy a check for that as well.

mk-fg commented 7 years ago

Thanks for testing btw! Surprised by some of unixy quirks in asyncio, but seem to be quite few of them, all things considered.

mk-fg commented 7 years ago

Hm, nope, looks like it's more complicated on windows - opened file can't be removed, and these "unlocker" tools apparently look for procs that have the handle(s) and close it there via something like ptrace()... oh well, guess I'll add some asyncio coroutine that'll try removing these later, up to and including when aria2 itself exits- will be removable for sure then.

Choonster commented 7 years ago

Thanks for the response. I'll be glad to get this working on Windows properly.

mk-fg commented 7 years ago

I think c28dccd should fix it (and other win-compat things are in 6547827 before that).

Now there's coro that basically retries unlinking any chunks it failed to remove instantly every 10s (conf.file_unlink_retry_delay) and when cancelled (i.e. on exit). Should work if aria2 eventually releases these things, as it should, and does with other chunks, I think - just a race between dl-complete event and close() on the file.

Maybe try and see how many chunks you have around while download is in progress? Should ideally be around 20 (conf.file_append_batch + any non-sequential ones), and definitely shouldn't swell into hundreds, as that'd indicate aria2 releasing files very poorly, and maybe need for some other fix.

mk-fg commented 7 years ago

Oh, right, gh doesn't linkify commits in other repos:

Choonster commented 7 years ago

The video now downloads, there are around 10-55 chunk files while the download is in progress.

When the download completes, there's one chunk file left that couldn't be deleted. New output is here.

mk-fg commented 7 years ago

Oh shoot, indeed that final rmtree runs before aria2c exits, easy to fix...

mk-fg commented 7 years ago

Should be fixed in https://github.com/mk-fg/fgtk/commit/277c0ed

mk-fg commented 7 years ago

there are around 10-55 chunk files while the download is in progress

Doesn't sound too bad, and can probably be reduced by lowering those batch/retry_delay opts above, if necessary.

mk-fg commented 7 years ago

Ugh, randomly spotted that script wasn't closing chunk files it opened itself! And that probably explains all the issues here - it was python GC runs that closed them at random times, and not aria2c's fault in any way whatsoever.

Can you try this version of the script: https://gist.github.com/mk-fg/396745b36f77a9fe84c0b27d92349fe7

I've reverted removal back to simple os.unlink there, but with proper closing of chunks, I strongly suspect it should work just fine, no need for that extra cleanup thing at all.

mk-fg commented 7 years ago

Also pushed that version to github now, as it really shouldn't be aria2's fault, just me not checking other possible (and more obvious) causes for that issue from the start...

Choonster commented 7 years ago

It looks like it's working now, the video is downloaded and the temporary directory is completely removed. Thanks for fixing it.

Choonster commented 7 years ago

@mk-fg If I created a PR to add the --output-format argument from the previous Windows version of the script to the new version, would that be welcome?

The MP4 videos produced by twitch_vod_fetch have a malformed AAC stream, which prevents Windows from showing their length and thumbnail (even though they can still be played in an MP4 player). I run a Lua script to detect and fix this after downloading.

The Lua script runs this ffmpeg command and scans the output for this line and the line after it. If it finds this message, it runs this ffmpeg command to fix the malformed stream and then overwrites the original file with the fixed one.

Would you consider adding a feature like this to twitch_vod_fetch, or should I just keep it in the Lua script?

mk-fg commented 7 years ago

If I created a PR to add the --output-format argument from the previous Windows version of the script to the new version, would that be welcome?

I think I've dropped it because it's redundant with --ytdl-opts (which apparently I forgot to use too), as it should be simple enough to specify -y ' -f 480p' or something, but if not, should definitely not be a problem to have that option back.

Will copy it from there in a moment.

MP4 videos produced by twitch_vod_fetch have a malformed AAC stream, which prevents Windows from showing their length and thumbnail (even though they can still be played in an MP4 player)

I've tried concatenating stuff with ffmpeg like youtube-dl does for twitch in that script initially, but found that either didn't work (produced broken files with "-f concat") or was terribly slow, which is why ended up using simple concatenation instead (fast and works).

Would you consider adding a feature like this to twitch_vod_fetch, or should I just keep it in the Lua script?

Should be easy to add ffmpeg to file_dst running alongside aria2 and just feed it instead of appending chunks to file, but I don't know how to handle interrupt/resume thing with it - can ffmpeg pick up half-finished file and start appending more stuff to it? It probably can, just need to find options-magic for it.

That might still have all kinds of issues with sudden interrupts though, and probably add a fragile code-path which I'll never use (and hence might break randomly and never notice, similar to windows things), so unless there's need to speed it up like that, you'd probably be much better off using custom post-processing script that's fully under your control and much simplier than something coded into that downloader thing.

Would it be useful to have ffmpeg assembling video on-the-fly like that?

mk-fg commented 7 years ago

I think I've dropped it because it's redundant with --ytdl-opts (which apparently I forgot to use too), as it should be simple enough to specify -y ' -f 480p' or something

Oh, no, apparently I was thinking about some other (maybe non-existent) option here entirely, and you mean very different thing wrt output filename templating. Will add the latter, shouldn't be difficult as such rename-after-download thing is already done for .part.mp4 file with --use-part-suffix.

Choonster commented 7 years ago

Would it be useful to have ffmpeg assembling video on-the-fly like that?

I only care about the final video being fixed, whether that's done on-the-fly or after the download has completed doesn't matter to me.

I'll probably just keep the feature in my post-processing script.

Will add the latter, shouldn't be difficult as such rename-after-download thing is already done for .part.mp4 file with --use-part-suffix.

Thanks, that would be useful to have.

mk-fg commented 7 years ago

Added in https://github.com/mk-fg/fgtk/commit/49c70b0

It's -o/--ytdl-output-format option, and it can have optional argument (format), but still need to be specified otherwise to use format from youtube-dl defaults/configuration, e.g. twitch_vod_fetch -o -- vod1-url vod1-tmp vod2-url vod2-tmp.

That standalone "--" is how argparse (and most unixy tools) allows to separate explicit arguments from anything else, so that "vod1-url" won't be used as youtube-dl --output template (arg to -o option instead of arg to whole command). Not needed if there's no ambiguity, e.g. twitch_vod_fetch -o name-tpl vod-url vod-tmp or twitch_vod_fetch -o --debug -a console-log-level=warn ....

Again, thanks for testing, was able to fix quite a few bugs in there as a result.

Also, might still be nice to keep a separate repo for "known to be working" windows version, which someone else can use without risk of stumbling into some random unixism that I added later on and didn't test. I.e. something that can be linked at https://github.com/mk-fg/fgtk/#twitch-vod-fetch as "tested and working windows version" in case anyone else was using it.

Choonster commented 7 years ago

Added in mk-fg/fgtk@49c70b0

Thanks.

Again, thanks for testing, was able to fix quite a few bugs in there as a result.

No problem.

Also, might still be nice to keep a separate repo for "known to be working" windows version, which someone else can use without risk of stumbling into some random unixism that I added later on and didn't test. I.e. something that can be linked at https://github.com/mk-fg/fgtk/#twitch-vod-fetch as "tested and working windows version" in case anyone else was using it.

That's a good idea.

I've decided to discontinue this repository now that I'm not maintaining a completely separate version of the script. You can link to Choonster/fgtk instead.