Kethsar / ytarchive

Garbage Youtube livestream downloader
MIT License
1.15k stars 93 forks source link

Unterminated String in JSON crashes download #93

Closed fren-archivist closed 2 years ago

fren-archivist commented 2 years ago

Since roughly a day ago, I have been sporadically getting errors from both the python and go versions resulting in either the download finishing entirely or the audio/video thread dying and the other continuing. Sometimes it also happens right at the end of the download. The frequency of the issue is high enough that I cannot really leave the code running and trust that it is likely to finish. Others have reported the same issue so it isn't just on my end.

Here's what the error message looks like for the python (the location of unterminated string changes):

  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.2800.0_x64__qbz5n2kfra8p0\lib\threading.py", line 932, in _bootstrap_inner
run
    self._target(*self._args, **self._kwargs)
  File "D:\ts\ytarchive.py", line 1216, in download_frags
    get_video_info(info)
  File "D:\ts\ytarchive.py", line 850, in get_video_info
    vals = get_playable_player_response(info)
  File "D:\ts\ytarchive.py", line 619, in get_playable_player_response
    player_response = get_player_response(info)
  File "D:\ts\ytarchive.py", line 497, in get_player_response
    player_response = json.loads(watch_parser.player_response_text)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.2800.0_x64__qbz5n2kfra8p0\lib\json\__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.2800.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.2800.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 54939 (char 54938)

The go version gives a similar but much more terse ERROR: Error retrieving player response: unexpected end of JSON input.

Kethsar commented 2 years ago

Can you give me an idea of just how often this happens? To figure out a potential cause I'll need to reproduce on my end, so I'll start downloading a lot of streams, but it would be good to have an idea of how many out of 10 streams or whatever you've had it happen. Also if you noticed that the stream had finished when it starts happening mid-download as well.

jiatern commented 2 years ago

image

I am facing the same issue starting today, I am using the pre-release version on ubuntu. It happened to more than half of my scheduled archives today but only when starting the download. If it is able to start downloading then it doesn't stop.

Kethsar commented 2 years ago

Interesting. I decided to omit cookies after seeing your screenshot and got it within a couple tries. Thanks. @fren-archivist were you using cookies any of the times it happened to you?

jiatern commented 2 years ago

@Kethsar using cookies seem to prevent this issue for now. Thank you.

Kethsar commented 2 years ago

@jiatern I hope so, but I wouldn't be surprised if it's just a coincidence. The error is my naive way of grabbing a JS variable from the watch page HTML, and occasionally a field in an object called botguardData contains a string full of JS that they may or may not evaluate, or is meant to catch bots trying to evaluate it since the field is called privateDoNotAccessOrElseSafeScriptWrappedValue.

Regardless I know the overall issue and what I need to figure out to fix it. I have a 3 day weekend after wark today, so it might finally be time to work on yta again.

fren-archivist commented 2 years ago

It happens a lot at the start of the download, but it also happens later, presumably when it tries to refresh the data (which I think is once per hour normally).

I haven't been using cookies. I'll try that for now and report back if it keeps happening.

Update: Using cookies, this issue happens for me 100% of the time for me. I confirmed the cookies work fine for yt-dlp. That suggests to me that this presumed bot check may be tied to account. If your account is not flagged maybe you won't be affected by this with cookies, but if it is cookies will make the problem worse.

fireattack commented 2 years ago

Might be a stupid question, but are you using the --cookies in front of your URL? I'm asking since I was tricked by that. If you put any switch after URL it won't be recognized, at least in Go version. And there is no error about it to let you know either.

This is different from what I expected since it doesn't happen with any argparse powered Python CLI tool which I'm familiar. And I saw the author didn't use argparse even in the Python version.

Anyway, using --cookies fixed my problem here for now.

Kethsar commented 2 years ago

Fixed in 7bcd012a985ea166ef7fb97196baf7f0528ba7be Simple as it is, I had to look for a case I encountered before where the player response is not the only thing in its script tag, and that took me a bit to remember what cases it happened. Thankfully even in those cases this should work.