Jessime / youtube_history

A quick analysis of all Youtube videos in a user's history.
MIT License
90 stars 4 forks source link

Re-open data (?) #2

Open Motuzkov opened 7 years ago

Motuzkov commented 7 years ago

Hello Jessime, I have to ask how to run "youtube_history.py" for second time after raw metadata was downloaded? The first run was succeed and it redirected me to 'YouTube History Analysis' page, but at second launch it always fails. So is there any way to re-open completed youtube analysis? and is there a way to combine 2 or more data folders into single database?

Error massage at second launch of "youtube_history.py":

(C:\Users\David\Anaconda3\envs\youtube) C:\Users\David\Desktop\youtube_history-master2>python youtube_history.py
Welcome!
Traceback (most recent call last):
  File "pandas\_libs\parsers.pyx", line 1162, in pandas._libs.parsers.TextReader._convert_tokens (pandas\_libs\parsers.c:14858)
  File "pandas\_libs\parsers.pyx", line 1273, in pandas._libs.parsers.TextReader._convert_with_dtype (pandas\_libs\parsers.c:17119)
  File "pandas\_libs\parsers.pyx", line 1289, in pandas._libs.parsers.TextReader._string_convert (pandas\_libs\parsers.c:17347)
  File "pandas\_libs\parsers.pyx", line 1524, in pandas._libs.parsers._string_box_utf8 (pandas\_libs\parsers.c:23041)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 50: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "youtube_history.py", line 297, in <module>
    analysis.run()
  File "youtube_history.py", line 287, in run
    self.start_analysis()
  File "youtube_history.py", line 273, in start_analysis
    self.check_df()
  File "youtube_history.py", line 184, in check_df
    self.df = pd.read_csv(df_file, index_col=0, parse_dates=[-11])
  File "C:\Users\David\Anaconda3\envs\youtube\lib\site-packages\pandas\io\parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\David\Anaconda3\envs\youtube\lib\site-packages\pandas\io\parsers.py", line 411, in _read
    data = parser.read(nrows)
  File "C:\Users\David\Anaconda3\envs\youtube\lib\site-packages\pandas\io\parsers.py", line 982, in read
    ret = self._engine.read(nrows)
  File "C:\Users\David\Anaconda3\envs\youtube\lib\site-packages\pandas\io\parsers.py", line 1719, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10862)
  File "pandas\_libs\parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas\_libs\parsers.c:11138)
  File "pandas\_libs\parsers.pyx", line 989, in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:12175)
  File "pandas\_libs\parsers.pyx", line 1117, in pandas._libs.parsers.TextReader._convert_column_data (pandas\_libs\parsers.c:14136)
  File "pandas\_libs\parsers.pyx", line 1169, in pandas._libs.parsers.TextReader._convert_tokens (pandas\_libs\parsers.c:14972)
  File "pandas\_libs\parsers.pyx", line 1273, in pandas._libs.parsers.TextReader._convert_with_dtype (pandas\_libs\parsers.c:17119)
  File "pandas\_libs\parsers.pyx", line 1289, in pandas._libs.parsers.TextReader._string_convert (pandas\_libs\parsers.c:17347)
  File "pandas\_libs\parsers.pyx", line 1524, in pandas._libs.parsers._string_box_utf8 (pandas\_libs\parsers.c:23041)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 50: invalid continuation byte

Thank you!

Jessime commented 7 years ago

So is there any way to re-open completed youtube analysis?

Yes, there is, and that's what the script is trying to do. For some reason though, it's unable to open the .csv file that pandas saved on the first run. It's hard to know what's wrong without seeing the file. Would you be okay with putting the youtube_history-master2\src\data\ran\df.csv file on pastebin?

is there a way to combine 2 or more data folders into single database?

This would be a little tricky, I think. You would have to rename all of the files in the second data folder (numbering them starting where the files in the first folder left off), then move them into the first.