Jessime / youtube_history

A quick analysis of all Youtube videos in a user's history.
MIT License
83 stars 4 forks source link

400 error and csv file for the first run #10

Closed daehwankim112 closed 3 years ago

daehwankim112 commented 3 years ago

Hello. Thanks for the interesting project.

I am on Windows 10.

I am receiving "Unable to look up account info: HTTP Error 400: Bad Request" after I type in my credentials.

I thought it is 2 years old project, the way things google handle might be changed and it may think it is a bot. $ python youtube_history.py -d 1 didn't work as well.

Therefore, I decided to get a metadata of my youtube history by myself. I went to https://takeout.google.com/ and got a json of my youtube history.

Then I get this error

Welcome!
Creating dataframe...
Traceback (most recent call last):
  File "youtube_history.py", line 313, in <module>
    analysis.run()
  File "youtube_history.py", line 300, in run
    self.start_analysis()
  File "youtube_history.py", line 286, in start_analysis
    self.check_df()
  File "youtube_history.py", line 198, in check_df
    self.df_from_files()
  File "youtube_history.py", line 160, in df_from_files
    data = [json.load(open(files.format(i))) for i in range(1, num + 1)]
  File "youtube_history.py", line 160, in <listcomp>
    data = [json.load(open(files.format(i))) for i in range(1, num + 1)]
  File "C:\Users\daehwan\Anaconda3\lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
UnicodeDecodeError: 'cp949' codec can't decode byte 0xed in position 46: illegal multibyte sequence

I read through other issues and I found that if it is a second time running it, it will look for csv file. I think it might think is not a first time running because there is a metadata. So I used $ python youtube_history.py -o /path/to/empty/data/directory/ in order to specify the location of metadata and it still look for csv.

Here are the things that could be a cause of error.

I am very interested in this project. Thank you

Jessime commented 3 years ago

Hey @daehwankim112, thanks for the thoughtful write up and for giving the project a whirl. Sorry you're having so much trouble.

With regards to the "Unable to look up account info: HTTP Error 400: Bad Request" error, using youtube-dl to login to youtube has always been hit or miss. You can see lots of other people running into similar problems this year:

https://github.com/ytdl-org/youtube-dl/issues/23860


Interestingly, I've never downloaded data with takeout. I just tried it now and I'll see what happens when I run it with my data.

One thing I noticed while downloading is that the file type I downloaded was a .zip file. Presumably you unzipped your download first (I assume you did, but always worth double checking)?

It looks like the UnicodeDecodeError can be fixed pretty easily, according to stackoverflow. I'll give it a shot when my data downloads.

More importantly though, I assume that the data format from takeout is going to be different than the one my program expects. So even if we fix the current issue, it might require an update to get the json read in correctly.

I'll report back ASAP!

daehwankim112 commented 3 years ago

Hello Jessime. Sorry for late responding and thank you so much for taking your time into this.

I unzipped my file before running it.

More importantly though, I assume that the data format from takeout is going to be different than the one my program expects. So even if we fix the current issue, it might require an update to get the json read in correctly.

I agree.

I will wait. If there is anything you want me to do, let me know. Thank you.

Jessime commented 3 years ago

Hey! One thing lead to another and I end up making a bunch of changes to this program over the last couple days. The most important one is that you can (and should) now specify the --takeout parameter:

python youtube_history.py --takeout /path/to/Takeout

Note that the data you've downloaded in Takeout is just a list of videos you've watched, but none of the information about the videos. So youtube_history.py will still take a while to run the first time while it downloads the metadata for each video. This shouldn't be an issue, just make sure you stay connected to the internet.

Also, sometime in the last 4 years, Google stopped saving the likes/dislikes for each video, and just stores an "average rating". So, I redid a bunch of stats to account for that. I think the end results are even better than they were before!

One small caveat is that I haven't done a ton of testing with this new code yet, just tried it on a couple of people.

Give it a whirl and let me know how it goes!

daehwankim112 commented 3 years ago

Hello Jessime. Thank you for an update. I am so glad that you are working on it!

I pip installed it and downloaded my takeout and ran it. It says

Welcome!
usage: youtube_history.py [-h] [-o OUT] [-d DELAY]
youtube_history.py: error: unrecognized arguments: --takeout ../../Takeout

Then I found out there is a -h parameter and it says

Welcome!
usage: youtube_history.py [-h] [-o OUT] [-d DELAY]

optional arguments:
  -h, --help            show this help message and exit
  -o OUT, --out OUT     Path to empty directory for data storage.
  -d DELAY, --delay DELAY
                        Time to wait between requests. May help avoid 2FA.

It does not recognize --takeout parameter. I looked into your commit and it seems the parameter recognition part is not added. There is no change in youtube_history.py

Maybe you forgot to commit and push?

Thank you

Jessime commented 3 years ago

The commit was pushed to GitHub 11 hours ago now.

So, all you need to do is git pull the update on the master branch of your local checkout.

But, I'm pretty curious about what you mean when you say "I pip installed it". There's no pip package (that I've made at least). So just wondering what you mean by this?

daehwankim112 commented 3 years ago

I saw

Copy or clone this package from Github.

Open the Terminal/Command Line and navigate to where you copied the package:

$ cd path/to/copied/directory Then, just run:

$ pip install -r requirements.txt to install the dependencies.

in README.md There are two things added in requirements.txt so I thought I might need to install it again. And I get bunch of messages saying requirement already satisfied. I think it works?

I cloned it and ran it again. I am seeing the same result.

C:\Users\daehwan\Desktop\youtube history\youtube_history>python youtube_history.py --takeout /data
Welcome!
usage: youtube_history.py [-h] [-o OUT] [-d DELAY]
youtube_history.py: error: unrecognized arguments: --takeout /data

This is the issue I think what might be the cause but I am pretty sure I am wrong.

if __name__ == '__main__':
    print('Welcome!'); stdout.flush()
    parser = argparse.ArgumentParser()
    parser.add_argument("-o", '--out', default='data',
                        help="Path to empty directory for data storage.")
    parser.add_argument('-d', '--delay', default=0,
                        help='Time to wait between requests. May help avoid 2FA.')
    args = parser.parse_args()
    analysis = Analysis(args.out, float(args.delay))
    analysis.run()
    launch_web()

This is line 304-314 in youtube_history.py at the moment. There is no parser.add_argument for --takeout.

Thank you

Jessime commented 3 years ago

🤦 that's because I pushed everything except the main file. Pretty dumb mistake on my part. Try pulling again!

daehwankim112 commented 3 years ago

Great! I ran it and it worked.

There were some errors that I had to fix by myself because of codec does not recognize my font and ignore error if video became only available for people who joined the channel.

For anyone who has cp949 error, (And I don't quiet recommend this method since you are changing lib directly. Maybe change back to what it was before after you get through it)

You've watched Korean video and codec does not recognize Korean because they uses cp949 instead of utf-8.

Go to C:\Users\your_name\Anaconda3\Lib Open pathlib.py and change read_text part (Line 1195) to this.

def read_text(self, encoding=None, errors=None):
        """
        Open the file in text mode, read it, and close the file.
        """
        with self.open(mode='r', encoding='UTF-8', errors='errors') as f:
            return f.read()

Done

For those who receive this error.

Creating dataframe...
Traceback (most recent call last):
  File "youtube_history.py", line 369, in <module>
    analysis.run()
  File "youtube_history.py", line 353, in run
    self.start_analysis()
  File "youtube_history.py", line 336, in start_analysis
    self.check_df()
  File "youtube_history.py", line 247, in check_df
    self.df_from_files()
  File "youtube_history.py", line 210, in df_from_files
    data = [json.load(open(files.format(i))) for i in range(1, num + 1)]
  File "youtube_history.py", line 210, in <listcomp>
    data = [json.load(open(files.format(i))) for i in range(1, num + 1)]
FileNotFoundError: [Errno 2] No such file or directory: 'data\\raw\\17724.info.json'

The video you have watched is now only available for people who joined the channel.

Edit youtube_history.py to be Line 166 line = p.stdout.readline().decode("utf-8", 'ignore').strip() Line 192 line = p.stdout.readline().decode("utf-8", 'ignore').strip() Done

For those who receive this error.

Welcome!
Creating dataframe...
Traceback (most recent call last):
  File "youtube_history.py", line 369, in <module>
    analysis.run()
  File "youtube_history.py", line 353, in run
    self.start_analysis()
  File "youtube_history.py", line 336, in start_analysis
    self.check_df()
  File "youtube_history.py", line 247, in check_df
    self.df_from_files()
  File "youtube_history.py", line 210, in df_from_files
    data = [json.load(open(files.format(i))) for i in range(1, num + 1)]
  File "youtube_history.py", line 210, in <listcomp>
    data = [json.load(open(files.format(i))) for i in range(1, num + 1)]
FileNotFoundError: [Errno 2] No such file or directory: 'data\\raw\\17724.info.json'

I don't know what happened. Maybe it crashed before it finishes downloading meta-data.

Edit youtube_history.py to be Line 207 num = number of missing json file - 1 Done

For those who see rectangular boxes in wordcloud.

You are missing a font for whatever language you are using.

Download or locate font you want to use. Edit youtube_history.py to be Line 232

wordcloud = WordCloud(font_path='path/to/the/font',
                              width=1920,
                              height=1080,
                              relative_scaling=.5)

Then world is beautiful and everything works fine.

Thank you so much for this. It helped me understand what I am. I surely watched a lot of memes... I also watched 18,906 videos which shocked me how much time I spent on Youtube. If I spent time more wisely I might be at Harvard around now. This is an awesome project. Keep up the good work.