egbertbouman / youtube-comment-downloader

Simple script for downloading Youtube comments without using the Youtube API
MIT License
881 stars 223 forks source link

ytcfg regex output failing to be parsed: "Error: Expecting value: line 1 column 1 (char 0)" #116

Closed Xandreashia closed 2 years ago

Xandreashia commented 2 years ago

It appears that the json module is having an issue with the returned ytcfg string, my traceback is giving the following on every video id I have tested so far:

From "youtube_comment_downloader/downloader.py", line 53, in get_comments_from_url ytcfg = json.loads(self.regex_search(html, YT_CFG_RE, default=''))

I just noticed it when my own project just called the comment downloader to download comments from a video list. I'm wondering if youtube has changed something with the config string. If I get some time free later I'll look into the source html and test the regex.

d0tN3t commented 2 years ago

Hey,

I have a heavily modified version of this script but it still uses the regex portion and I'm not running into the same issue you are.

Error: Expecting value: These tend to be JSON errors from my experience working with this script. So you're on the right track. That being said, I'm guessing since you explained that you're pulling from a list, YouTube may have returned a page that doesn't contain the ytcfg portion because the page is not the expected HTML page but rather a BOT warning (excessive scraping).

To get an idea of what is going on, I would need you to print out the value in HTML on line 52 html = response.text to see exactly what is being returned and why it can be parsed using JSON. Also, please provide a video URL that you're trying to pull from.

If my suspicion is correct, you would either need to get in the habit of using proxy rotation and/or slow down your scraping rate.

I look forward to hearing from you. ☺

Xandreashia commented 2 years ago

I think you might have been right about the excessive scraping warning, I just took a look at it and things are working fine again.