Serene-Arc / bulk-downloader-for-reddit

Downloads and archives content from reddit
https://pypi.org/project/bdfr
GNU General Public License v3.0
2.29k stars 211 forks source link

[BUG] Error 403 crashes the downloader #677

Closed slormo closed 1 year ago

slormo commented 1 year ago

Description

Downloading from an ID file causes a 403 error which crashes the downloader. This might be from a deleted post/subreddit in the id file, but I'm not sure.

Command

python3 -m bdfr clone --include-id-file post_votes.csv  --user me --authenticate --file-scheme "{DATE}_{TITLE} - ({REDDITOR}) - ({SUBREDDIT}) [{POSTID}]" --folder-scheme "" --no-dupes archive

Environment (please complete the following information):

Logs

[2022-10-03 14:49:10,938 - bdfr.connector - DEBUG] - Setting maximum download wait time to 120 seconds
[2022-10-03 14:49:10,938 - bdfr.connector - DEBUG] - Setting datetime format string to ISO
[2022-10-03 14:49:10,939 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2022-10-03 14:49:10,939 - bdfr.connector - Level 9] - Created download filter
[2022-10-03 14:49:10,939 - bdfr.connector - Level 9] - Created time filter
[2022-10-03 14:49:10,940 - bdfr.connector - Level 9] - Created sort filter
[2022-10-03 14:49:10,940 - bdfr.connector - Level 9] - Create file name formatter
[2022-10-03 14:49:10,940 - bdfr.connector - DEBUG] - Using authenticated Reddit instance
[2022-10-03 14:49:10,941 - bdfr.oauth2 - Level 9] - Loaded OAuth2 token for authoriser
[2022-10-03 14:49:11,217 - bdfr.oauth2 - Level 9] - Written OAuth2 token from authoriser to C:\Users\****\AppData\Local\BDFR\bdfr\default_config.cfg
[2022-10-03 14:49:11,519 - bdfr.connector - Level 9] - Resolved user to ****
[2022-10-03 14:49:11,522 - bdfr.connector - Level 9] - Created site authenticator
[2022-10-03 14:49:11,523 - bdfr.connector - Level 9] - Retrieved subreddits
[2022-10-03 14:49:11,523 - bdfr.connector - Level 9] - Retrieved multireddits
[2022-10-03 14:49:11,523 - bdfr.connector - Level 9] - Retrieved user data
[2022-10-03 14:49:11,595 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2022-10-03 14:49:12,297 - bdfr.downloader - DEBUG] - Attempting to download submission 9udfcy
[2022-10-03 14:49:12,647 - bdfr.site_downloaders.youtube - ERROR] - ERROR: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube
Traceback (most recent call last):
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\common.py", line 670, in extract
    ie_result = self._real_extract(url)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\generic.py", line 2615, in _real_extract
    raise ExtractorError(
yt_dlp.utils.ExtractorError: '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1459, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1535, in __extract_info
    ie_result = ie.extract(url)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\common.py", line 696, in extract
    raise type(e)(e.orig_msg, **kwargs)
yt_dlp.utils.ExtractorError: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\site_downloaders\youtube.py", line 66, in get_video_data
    result = ydl.extract_info(url, download=False)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1448, in extract_info
    return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1477, in wrapper
    self.report_error(str(e), e.format_traceback())
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 994, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 934, in trouble
    raise DownloadError(message, exc_info)
yt_dlp.utils.DownloadError: ERROR: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube
[2022-10-03 14:49:12,649 - bdfr.downloader - ERROR] - Could not download submission 9udfcy: No downloader module exists for url 
[2022-10-03 14:49:12,649 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission 9udfcy
[2022-10-03 14:49:12,661 - bdfr.archiver - DEBUG] - Writing entry 9udfcy to file in JSON format at ****
[2022-10-03 14:49:12,661 - bdfr.archiver - INFO] - Record for entry item 9udfcy written to disk
[2022-10-03 14:49:13,302 - bdfr.downloader - DEBUG] - Attempting to download submission nrvnre
[2022-10-03 14:49:13,302 - bdfr.downloader - DEBUG] - Using Redgifs with url ****
[2022-10-03 14:49:13,539 - bdfr.downloader - ERROR] - Site Redgifs failed to download submission nrvnre: Server responded with 404 to ****
[2022-10-03 14:49:13,539 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission nrvnre
[2022-10-03 14:49:13,544 - bdfr.archiver - DEBUG] - Writing entry nrvnre to file in JSON format at ****
[2022-10-03 14:49:13,544 - bdfr.archiver - INFO] - Record for entry item nrvnre written to disk
[2022-10-03 14:49:13,715 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\__main__.py", line 126, in cli_clone
    reddit_scraper.download()
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\cloner.py", line 20, in download
    self._download_submission(submission)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\downloader.py", line 51, in _download_submission
    elif submission.subreddit.display_name.lower() in self.args.skip_subreddit:
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\base.py", line 34, in __getattr__
    self._fetch()
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\submission.py", line 581, in _fetch
    data = self._fetch_data()
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\submission.py", line 578, in _fetch_data
    return self._reddit.request("GET", path, params)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\reddit.py", line 848, in request
    return self._core.request(
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\prawcore\sessions.py", line 324, in request
    return self._request_with_retries(
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\prawcore\sessions.py", line 260, in _request_with_retries
    raise self.STATUS_EXCEPTIONS[response.status_code](response)
prawcore.exceptions.Forbidden: received 403 HTTP response
Serene-Arc commented 1 year ago

Firstly, why are you passing in a CSV file to the BDFR? What is the format of that file?

slormo commented 1 year ago

The CSV is from the reddit data request, and contains a link in each line. It is formatted un UTF-8. Screenshot 2022-10-04 163323 *The full reddit links are in the csv file, the ones in the screenshot have been edited for privacy

Serene-Arc commented 1 year ago

That isn't the format. The format is one ID per line, as said in the documentation.

slormo commented 1 year ago

Screenshot 2022-10-04 223613 So like this?

If I run it like that it throws this error:

[2022-10-04 22:34:24,078 - bdfr.connector - DEBUG] - Setting maximum download wait time to 120 seconds
[2022-10-04 22:34:24,078 - bdfr.connector - DEBUG] - Setting datetime format string to ISO
[2022-10-04 22:34:24,079 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2022-10-04 22:34:24,079 - bdfr.connector - Level 9] - Created download filter
[2022-10-04 22:34:24,080 - bdfr.connector - Level 9] - Created time filter
[2022-10-04 22:34:24,080 - bdfr.connector - Level 9] - Created sort filter
[2022-10-04 22:34:24,080 - bdfr.connector - Level 9] - Create file name formatter
[2022-10-04 22:34:24,080 - bdfr.connector - DEBUG] - Using authenticated Reddit instance
[2022-10-04 22:34:24,081 - bdfr.oauth2 - Level 9] - Loaded OAuth2 token for authoriser
[2022-10-04 22:34:24,345 - bdfr.oauth2 - Level 9] - Written OAuth2 token from authoriser to C:\Users\****\AppData\Local\BDFR\bdfr\default_config.cfg
[2022-10-04 22:34:24,681 - bdfr.connector - Level 9] - Resolved user to ****
[2022-10-04 22:34:24,683 - bdfr.connector - Level 9] - Created site authenticator
[2022-10-04 22:34:24,683 - bdfr.connector - Level 9] - Retrieved subreddits
[2022-10-04 22:34:24,684 - bdfr.connector - Level 9] - Retrieved multireddits
[2022-10-04 22:34:24,684 - bdfr.connector - Level 9] - Retrieved user data
[2022-10-04 22:34:24,709 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\__main__.py", line 125, in cli_clone
    reddit_scraper = RedditCloner(config)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\cloner.py", line 15, in __init__
    super(RedditCloner, self).__init__(args)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\downloader.py", line 38, in __init__
    super(RedditDownloader, self).__init__(args)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\archiver.py", line 26, in __init__
    super(Archiver, self).__init__(args)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\connector.py", line 59, in __init__
    self.reddit_lists = self.retrieve_reddit_lists()
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\connector.py", line 162, in retrieve_reddit_lists
    master_list.extend(self.get_submissions_from_link())
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\archiver.py", line 51, in get_submissions_from_link
    supplied_submissions.append(self.reddit_instance.submission(url=sub_id))
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\util\deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\reddit.py", line 981, in submission
    return models.Submission(self, id=id, url=url)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\submission.py", line 584, in __init__
    self.id = self.id_from_url(url)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\submission.py", line 456, in id_from_url
    parts = RedditBase._url_parts(url)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\base.py", line 19, in _url_parts
    raise InvalidURL(url)
praw.exceptions.InvalidURL: Invalid URL: 9.75E+03
Serene-Arc commented 1 year ago

That means that somewhere in that, there is something that Excel is marking as a number. This is why text files are recommended. You'll have to mark that column as a text type in Excel or paste into a text file so it doesn't change things automatically, in this case an ID that is entirely digits by the look of it.

slormo commented 1 year ago

Okay, I've found an invalid entry that's just digits so I deleted it, and I've also marked it as text and converted it to a text file. I get a different error this time:

[2022-10-05 18:53:17,231 - bdfr.connector - DEBUG] - Setting maximum download wait time to 120 seconds
[2022-10-05 18:53:17,231 - bdfr.connector - DEBUG] - Setting datetime format string to ISO
[2022-10-05 18:53:17,232 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2022-10-05 18:53:17,232 - bdfr.connector - Level 9] - Created download filter
[2022-10-05 18:53:17,232 - bdfr.connector - Level 9] - Created time filter
[2022-10-05 18:53:17,232 - bdfr.connector - Level 9] - Created sort filter
[2022-10-05 18:53:17,233 - bdfr.connector - Level 9] - Create file name formatter
[2022-10-05 18:53:17,234 - bdfr.connector - DEBUG] - Using authenticated Reddit instance
[2022-10-05 18:53:17,235 - bdfr.oauth2 - Level 9] - Loaded OAuth2 token for authoriser
[2022-10-05 18:53:17,510 - bdfr.oauth2 - Level 9] - Written OAuth2 token from authoriser to C:\Users\****\AppData\Local\BDFR\bdfr\default_config.cfg
[2022-10-05 18:53:17,807 - bdfr.connector - Level 9] - Resolved user to ****
[2022-10-05 18:53:17,809 - bdfr.connector - Level 9] - Created site authenticator
[2022-10-05 18:53:17,809 - bdfr.connector - Level 9] - Retrieved subreddits
[2022-10-05 18:53:17,810 - bdfr.connector - Level 9] - Retrieved multireddits
[2022-10-05 18:53:17,810 - bdfr.connector - Level 9] - Retrieved user data
[2022-10-05 18:53:17,919 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2022-10-05 18:53:18,407 - bdfr.downloader - DEBUG] - Attempting to download submission lvirs1
[2022-10-05 18:53:18,408 - bdfr.downloader - DEBUG] - Using Direct with url ****
[2022-10-05 18:53:18,779 - bdfr.downloader - DEBUG] - Written file to ****
[2022-10-05 18:53:18,779 - bdfr.downloader - DEBUG] - Hash added to master list: 45b9cef75fcafdc0bd2019ba8f4819bc
[2022-10-05 18:53:18,780 - bdfr.downloader - INFO] - Downloaded submission lvirs1 from ****
[2022-10-05 18:53:18,780 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission lvirs1
[2022-10-05 18:53:18,786 - bdfr.archiver - DEBUG] - Writing entry lvirs1 to file in JSON format at ****
[2022-10-05 18:53:18,786 - bdfr.archiver - INFO] - Record for entry item lvirs1 written to disk
[2022-10-05 18:53:19,069 - bdfr.downloader - DEBUG] - Attempting to download submission 98nov0
[2022-10-05 18:53:19,070 - bdfr.downloader - DEBUG] - Using Imgur with url ****
[2022-10-05 18:53:19,580 - bdfr.downloader - DEBUG] - Written file to ****
[2022-10-05 18:53:19,581 - bdfr.downloader - DEBUG] - Hash added to master list: 7933c8ef8fb11d93ea497c58a4c8cdb9
[2022-10-05 18:53:19,581 - bdfr.downloader - INFO] - Downloaded submission 98nov0 from ****
[2022-10-05 18:53:19,582 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission 98nov0
[2022-10-05 18:53:19,586 - bdfr.archiver - DEBUG] - Writing entry 98nov0 to file in JSON format at ****
[2022-10-05 18:53:19,587 - bdfr.archiver - INFO] - Record for entry item 98nov0 written to disk
[2022-10-05 18:53:20,063 - bdfr.downloader - DEBUG] - Attempting to download submission 90npuf
[2022-10-05 18:53:20,388 - bdfr.site_downloaders.youtube - ERROR] - ERROR: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube
Traceback (most recent call last):
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\common.py", line 670, in extract
    ie_result = self._real_extract(url)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\generic.py", line 2615, in _real_extract
    raise ExtractorError(
yt_dlp.utils.ExtractorError: '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1459, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1535, in __extract_info
    ie_result = ie.extract(url)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\common.py", line 696, in extract
    raise type(e)(e.orig_msg, **kwargs)
yt_dlp.utils.ExtractorError: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\site_downloaders\youtube.py", line 66, in get_video_data
    result = ydl.extract_info(url, download=False)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1448, in extract_info
    return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1477, in wrapper
    self.report_error(str(e), e.format_traceback())
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 994, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 934, in trouble
    raise DownloadError(message, exc_info)
yt_dlp.utils.DownloadError: ERROR: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube
[2022-10-05 18:53:20,389 - bdfr.downloader - ERROR] - Could not download submission 90npuf: No downloader module exists for url 
[2022-10-05 18:53:20,389 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission 90npuf
[2022-10-05 18:53:20,395 - bdfr.archiver - DEBUG] - Writing entry 90npuf to file in JSON format at ****
[2022-10-05 18:53:20,395 - bdfr.archiver - INFO] - Record for entry item 90npuf written to disk
[2022-10-05 18:53:20,560 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\__main__.py", line 126, in cli_clone
    reddit_scraper.download()
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\cloner.py", line 20, in download
    self._download_submission(submission)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\downloader.py", line 51, in _download_submission
    elif submission.subreddit.display_name.lower() in self.args.skip_subreddit:
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\base.py", line 34, in __getattr__
    self._fetch()
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\submission.py", line 632, in _fetch
    data = self._fetch_data()
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\submission.py", line 629, in _fetch_data
    return self._reddit.request(method="GET", params=params, path=path)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\util\deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\reddit.py", line 941, in request
    return self._core.request(
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\prawcore\sessions.py", line 330, in request
    return self._request_with_retries(
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\prawcore\sessions.py", line 266, in _request_with_retries
    raise self.STATUS_EXCEPTIONS[response.status_code](response)
prawcore.exceptions.Forbidden: received 403 HTTP response

I've also included the text file containing the ID's, but there shouldn't be anything wrong with it. post_votes1.txt

Serene-Arc commented 1 year ago

Okay, thank you. This I can investigate.

gageirwin commented 1 year ago

Some helpful info about the praw 403. I have a script that uses the pushshift.io api to get submission links from suspended/deletd users. From this I noticed that the 403 error in praw will only happen if you are trying to get submission information from subreddits that you can't access such as Banned, Locked (I assume you can if logged in and you have access), and Suspended/Deleted users (the /r/u_User page). So essentially if a submission 403's then you can also just skip all submissions links from that subreddit in the future saving api calls.

rewrib commented 1 year ago

Yes I also encounter this issue, when I attempt to download deleted posts from deleted users.

rewrib commented 1 year ago

downloading the files one by one with a batch script is a workaround

example:

for /F "tokens=*" %%A in (C:...\post_votes.csv) do python -m bdfr download C:...\downloadFolder --user me --authenticate --link %%A

Serene-Arc commented 1 year ago

Does anyone have a submission they know for certain causes this error? I need it for testing purposes. I have a fix written but the tests need to be added.

rewrib commented 1 year ago

this is an example for an user-deleted post: https://www.reddit.com/user/Schneewittchen666/comments/ijy4ch/deleted_by_user/

this is an example for a post from a banned subreddit: https://www.reddit.com/r/VIP142/comments/kw4wjm/chanel_uzi/

both give the same error, but maybe there is difference, in how they behave in the backend

slormo commented 1 year ago

Using latest development branch and I'm still getting the 403 error. Could it because the dead link is inside an id file?

[2022-11-23 23:13:25,608 - bdfr.connector - DEBUG] - Setting datetime format string to ISO
[2022-11-23 23:13:25,609 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2022-11-23 23:13:25,609 - bdfr.connector - Level 9] - Created download filter
[2022-11-23 23:13:25,609 - bdfr.connector - Level 9] - Created time filter
[2022-11-23 23:13:25,609 - bdfr.connector - Level 9] - Created sort filter
[2022-11-23 23:13:25,609 - bdfr.connector - Level 9] - Create file name formatter
[2022-11-23 23:13:25,610 - bdfr.connector - DEBUG] - Using authenticated Reddit instance
[2022-11-23 23:13:25,611 - bdfr.oauth2 - Level 9] - Loaded OAuth2 token for authoriser
[2022-11-23 23:13:25,925 - bdfr.oauth2 - Level 9] - Written OAuth2 token from authoriser to C:\Users\*****\AppData\Local\BDFR\bdfr\default_config.cfg
[2022-11-23 23:13:26,276 - bdfr.connector - Level 9] - Resolved user to ****
[2022-11-23 23:13:26,279 - bdfr.connector - Level 9] - Created site authenticator
[2022-11-23 23:13:26,279 - bdfr.connector - Level 9] - Retrieved subreddits
[2022-11-23 23:13:26,279 - bdfr.connector - Level 9] - Retrieved multireddits
[2022-11-23 23:13:26,279 - bdfr.connector - Level 9] - Retrieved user data
[2022-11-23 23:13:26,394 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2022-11-23 23:13:26,769 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:26,769 - bdfr.downloader - DEBUG] - Using Imgur with url ****
[2022-11-23 23:13:27,448 - bdfr.downloader - DEBUG] - Written file to ****
[2022-11-23 23:13:27,448 - bdfr.downloader - DEBUG] - Hash added to master list: d75c0538763e92c5dac74523f8310365
[2022-11-23 23:13:27,448 - bdfr.downloader - INFO] - Downloaded submission **** from ****
[2022-11-23 23:13:27,449 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:27,452 - bdfr.archiver - DEBUG] - Writing entry **** to file in JSON format at ****
[2022-11-23 23:13:27,452 - bdfr.archiver - INFO] - Record for entry item **** written to disk
[2022-11-23 23:13:27,793 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:27,794 - bdfr.downloader - DEBUG] - Using Redgifs with url ****
[2022-11-23 23:13:35,944 - bdfr.downloader - DEBUG] - Written file to ****
[2022-11-23 23:13:35,945 - bdfr.downloader - DEBUG] - Hash added to master list: c82139da49f92f6738689c5378d99ee8
[2022-11-23 23:13:35,945 - bdfr.downloader - INFO] - Downloaded submission **** from ****
[2022-11-23 23:13:35,946 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:35,950 - bdfr.archiver - DEBUG] - Writing entry **** to file in JSON format at ****
[2022-11-23 23:13:35,950 - bdfr.archiver - INFO] - Record for entry item tinay1 written to disk
[2022-11-23 23:13:36,323 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:36,324 - bdfr.downloader - DEBUG] - Using Direct with url ****
[2022-11-23 23:13:36,708 - bdfr.downloader - DEBUG] - Written file to ****
[2022-11-23 23:13:36,708 - bdfr.downloader - DEBUG] - Hash added to master list: 4be329ae1635914609a96e243f962503
[2022-11-23 23:13:36,708 - bdfr.downloader - INFO] - Downloaded submission **** from ****
[2022-11-23 23:13:36,709 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:36,712 - bdfr.archiver - DEBUG] - Writing entry **** to file in JSON format at ****
[2022-11-23 23:13:36,712 - bdfr.archiver - INFO] - Record for entry item **** written to disk
[2022-11-23 23:13:37,067 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:37,067 - bdfr.downloader - DEBUG] - Using Direct with url ****
[2022-11-23 23:13:37,434 - bdfr.downloader - DEBUG] - Written file to ****
[2022-11-23 23:13:37,434 - bdfr.downloader - DEBUG] - Hash added to master list: e2ee5453c7b95f05076f4824a0647a01
[2022-11-23 23:13:37,434 - bdfr.downloader - INFO] - Downloaded submission **** from ****
[2022-11-23 23:13:37,435 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:37,438 - bdfr.archiver - DEBUG] - Writing entry **** to file in JSON format at ****
[2022-11-23 23:13:37,438 - bdfr.archiver - INFO] - Record for entry item **** written to disk
[2022-11-23 23:13:37,870 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:37,870 - bdfr.downloader - DEBUG] - Using Redgifs with url ****
[2022-11-23 23:13:40,639 - bdfr.downloader - DEBUG] - Written file to ****
[2022-11-23 23:13:40,639 - bdfr.downloader - DEBUG] - Hash added to master list: 5516c2c2581511d8ce2ffd8419c183a0
[2022-11-23 23:13:40,639 - bdfr.downloader - INFO] - Downloaded submission **** from ****
[2022-11-23 23:13:40,640 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:40,643 - bdfr.archiver - DEBUG] - Writing entry **** to file in JSON format at ****
[2022-11-23 23:13:40,643 - bdfr.archiver - INFO] - Record for entry item **** written to disk
[2022-11-23 23:13:41,216 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:41,550 - bdfr.site_downloaders.youtube - ERROR] - ERROR: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube
Traceback (most recent call last):
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\common.py", line 670, in extract
    ie_result = self._real_extract(url)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\generic.py", line 2615, in _real_extract
    raise ExtractorError(
yt_dlp.utils.ExtractorError: '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1459, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1535, in __extract_info
    ie_result = ie.extract(url)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\common.py", line 696, in extract
    raise type(e)(e.orig_msg, **kwargs)
yt_dlp.utils.ExtractorError: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\site_downloaders\youtube.py", line 66, in get_video_data
    result = ydl.extract_info(url, download=False)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1448, in extract_info
    return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1477, in wrapper
    self.report_error(str(e), e.format_traceback())
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 994, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 934, in trouble
    raise DownloadError(message, exc_info)
yt_dlp.utils.DownloadError: ERROR: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube
[2022-11-23 23:13:41,551 - bdfr.downloader - ERROR] - Could not download submission ****: No downloader module exists for url 
[2022-11-23 23:13:41,552 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:41,556 - bdfr.archiver - DEBUG] - Writing entry **** to file in JSON format at ****
[2022-11-23 23:13:41,556 - bdfr.archiver - INFO] - Record for entry item **** written to disk
[2022-11-23 23:13:41,898 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:41,943 - bdfr.site_downloaders.youtube - ERROR] - ERROR: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube
Traceback (most recent call last):
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\common.py", line 670, in extract
    ie_result = self._real_extract(url)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\generic.py", line 2615, in _real_extract
    raise ExtractorError(
yt_dlp.utils.ExtractorError: '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1459, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1535, in __extract_info
    ie_result = ie.extract(url)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\common.py", line 696, in extract
    raise type(e)(e.orig_msg, **kwargs)
yt_dlp.utils.ExtractorError: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\site_downloaders\youtube.py", line 66, in get_video_data
    result = ydl.extract_info(url, download=False)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1448, in extract_info
    return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1477, in wrapper
    self.report_error(str(e), e.format_traceback())
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 994, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 934, in trouble
    raise DownloadError(message, exc_info)
yt_dlp.utils.DownloadError: ERROR: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube
[2022-11-23 23:13:41,943 - bdfr.downloader - ERROR] - Could not download submission ****: No downloader module exists for url 
[2022-11-23 23:13:41,944 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:41,947 - bdfr.archiver - DEBUG] - Writing entry **** to file in JSON format at ****
[2022-11-23 23:13:41,947 - bdfr.archiver - INFO] - Record for entry item **** written to disk
[2022-11-23 23:13:42,442 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:42,442 - bdfr.downloader - DEBUG] - Using Gfycat with url ****
[2022-11-23 23:13:45,309 - bdfr.downloader - DEBUG] - Written file to ****
[2022-11-23 23:13:45,309 - bdfr.downloader - DEBUG] - Hash added to master list: 1362e01656fec7f12d631f4b363ae62e
[2022-11-23 23:13:45,310 - bdfr.downloader - INFO] - Downloaded submission **** from ****
[2022-11-23 23:13:45,310 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:45,314 - bdfr.archiver - DEBUG] - Writing entry **** to file in JSON format at ****
[2022-11-23 23:13:45,314 - bdfr.archiver - INFO] - Record for entry item **** written to disk
[2022-11-23 23:13:45,657 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:45,658 - bdfr.downloader - DEBUG] - Using Direct with url ****
[2022-11-23 23:13:46,111 - bdfr.downloader - DEBUG] - Written file to ****
[2022-11-23 23:13:46,112 - bdfr.downloader - DEBUG] - Hash added to master list: 53e20f19d33c577745ff5d6827bdd2d7
[2022-11-23 23:13:46,112 - bdfr.downloader - INFO] - Downloaded submission **** from ****
[2022-11-23 23:13:46,113 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:46,116 - bdfr.archiver - DEBUG] - Writing entry **** to file in JSON format at ****
[2022-11-23 23:13:46,116 - bdfr.archiver - INFO] - Record for entry item **** written to disk
[2022-11-23 23:13:46,862 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:46,907 - bdfr.site_downloaders.youtube - ERROR] - ERROR: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube
Traceback (most recent call last):
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\common.py", line 670, in extract
    ie_result = self._real_extract(url)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\generic.py", line 2615, in _real_extract
    raise ExtractorError(
yt_dlp.utils.ExtractorError: '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1459, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1535, in __extract_info
    ie_result = ie.extract(url)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\extractor\common.py", line 696, in extract
    raise type(e)(e.orig_msg, **kwargs)
yt_dlp.utils.ExtractorError: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\site_downloaders\youtube.py", line 66, in get_video_data
    result = ydl.extract_info(url, download=False)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1448, in extract_info
    return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 1477, in wrapper
    self.report_error(str(e), e.format_traceback())
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 994, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\yt_dlp\YoutubeDL.py", line 934, in trouble
    raise DownloadError(message, exc_info)
yt_dlp.utils.DownloadError: ERROR: [generic] '' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube
[2022-11-23 23:13:46,907 - bdfr.downloader - ERROR] - Could not download submission ****: No downloader module exists for url 
[2022-11-23 23:13:46,908 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:46,911 - bdfr.archiver - DEBUG] - Writing entry **** to file in JSON format at ****
[2022-11-23 23:13:46,912 - bdfr.archiver - INFO] - Record for entry item **** written to disk
[2022-11-23 23:13:47,214 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:47,214 - bdfr.downloader - DEBUG] - Using Imgur with url ****
[2022-11-23 23:13:48,075 - bdfr.downloader - DEBUG] - Written file to ****
[2022-11-23 23:13:48,076 - bdfr.downloader - DEBUG] - Hash added to master list: 280512dce2d480cc0b2019a7e7983785
[2022-11-23 23:13:48,076 - bdfr.downloader - INFO] - Downloaded submission **** from ****
[2022-11-23 23:13:48,076 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:48,080 - bdfr.archiver - DEBUG] - Writing entry **** to file in JSON format at ****
[2022-11-23 23:13:48,080 - bdfr.archiver - INFO] - Record for entry item **** written to disk
[2022-11-23 23:13:50,716 - bdfr.downloader - DEBUG] - Attempting to download submission ****
[2022-11-23 23:13:50,717 - bdfr.downloader - DEBUG] - Using Direct with url ****
[2022-11-23 23:13:50,720 - bdfr.downloader - DEBUG] - File **** from submission **** already exists, continuing
[2022-11-23 23:13:50,720 - bdfr.downloader - INFO] - Downloaded submission **** from ****
[2022-11-23 23:13:50,720 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission ****
[2022-11-23 23:13:51,146 - bdfr.archiver - DEBUG] - Writing entry **** to ****
[2022-11-23 23:13:51,146 - bdfr.archiver - INFO] - Record for entry item **** written to disk
[2022-11-23 23:13:51,369 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\__main__.py", line 126, in cli_clone
    reddit_scraper.download()
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\cloner.py", line 20, in download
    self._download_submission(submission)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\downloader.py", line 55, in _download_submission
    elif submission.subreddit.display_name.lower() in self.args.skip_subreddit:
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\base.py", line 34, in __getattr__
    self._fetch()
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\submission.py", line 634, in _fetch
    data = self._fetch_data()
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\submission.py", line 631, in _fetch_data
    return self._reddit.request(method="GET", params=params, path=path)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\util\deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\reddit.py", line 941, in request
    return self._core.request(
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\prawcore\sessions.py", line 330, in request
    return self._request_with_retries(
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\prawcore\sessions.py", line 266, in _request_with_retries
    raise self.STATUS_EXCEPTIONS[response.status_code](response)
prawcore.exceptions.Forbidden: received 403 HTTP response
Serene-Arc commented 1 year ago

Removing the IDs and URLs from the logs defeats the entire purpose of providing them. Please don't do that. Tell me the submission ID so I can add it to the tests and investigate.

slormo commented 1 year ago

The IDs and URLs that appear in the log might be irrelevant anyways, since those IDs have successfully been scraped.

[2022-11-24 00:50:02,655 - bdfr.connector - DEBUG] - Setting datetime format string to ISO
[2022-11-24 00:50:02,656 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2022-11-24 00:50:02,656 - bdfr.connector - Level 9] - Created download filter
[2022-11-24 00:50:02,657 - bdfr.connector - Level 9] - Created time filter
[2022-11-24 00:50:02,657 - bdfr.connector - Level 9] - Created sort filter
[2022-11-24 00:50:02,657 - bdfr.connector - Level 9] - Create file name formatter
[2022-11-24 00:50:02,659 - bdfr.connector - DEBUG] - Using authenticated Reddit instance
[2022-11-24 00:50:02,660 - bdfr.oauth2 - Level 9] - Loaded OAuth2 token for authoriser
[2022-11-24 00:50:02,945 - bdfr.oauth2 - Level 9] - Written OAuth2 token from authoriser to C:\Users\*****\AppData\Local\BDFR\bdfr\default_config.cfg
[2022-11-24 00:50:03,258 - bdfr.connector - Level 9] - Resolved user to ****
[2022-11-24 00:50:03,261 - bdfr.connector - Level 9] - Created site authenticator
[2022-11-24 00:50:03,261 - bdfr.connector - Level 9] - Retrieved subreddits
[2022-11-24 00:50:03,261 - bdfr.connector - Level 9] - Retrieved multireddits
[2022-11-24 00:50:03,261 - bdfr.connector - Level 9] - Retrieved user data
[2022-11-24 00:50:03,380 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2022-11-24 00:50:03,583 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\__main__.py", line 126, in cli_clone
    reddit_scraper.download()
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\cloner.py", line 20, in download
    self._download_submission(submission)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\downloader.py", line 55, in _download_submission
    elif submission.subreddit.display_name.lower() in self.args.skip_subreddit:
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\base.py", line 34, in __getattr__
    self._fetch()
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\submission.py", line 634, in _fetch
    data = self._fetch_data()
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\submission.py", line 631, in _fetch_data
    return self._reddit.request(method="GET", params=params, path=path)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\util\deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\reddit.py", line 941, in request
    return self._core.request(
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\prawcore\sessions.py", line 330, in request
    return self._request_with_retries(
  File "C:\Users\*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\prawcore\sessions.py", line 266, in _request_with_retries
    raise self.STATUS_EXCEPTIONS[response.status_code](response)
prawcore.exceptions.Forbidden: received 403 HTTP response

This log did not even provide any IDs (which I assume is because it immediately tried to scrape a dead link), so I don't know if any IDs that appear in the log are relevant. However next time I'll include the IDs and links anyways.

Serene-Arc commented 1 year ago

Should be fixed with #87104e7 which I just pushed. An oversight on my part, apologies.