Serene-Arc / bulk-downloader-for-reddit

Downloads and archives content from reddit
https://pypi.org/project/bdfr
GNU General Public License v3.0
2.28k stars 211 forks source link

[BUG] Cannot use the archive command together with --include-id-file and Reddit IDs #839

Open Fakeaccount12312 opened 1 year ago

Fakeaccount12312 commented 1 year ago

Description

When the archive or clone command is used together with --include-id-file and the ID file contains Reddit IDs, bdfr will crash immediately. This does not happen with the download command, or when the full Reddit urls are used.

Command

python -m bdfr archive "" --include-id-file test.txt

test.txt contains a single line with an example ID: 12toup0

Environment

Logs

[2023-04-21 14:07:28,834 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2023-04-21 14:07:28,834 - bdfr.connector - Level 9] - Created download filter
[2023-04-21 14:07:28,834 - bdfr.connector - Level 9] - Created time filter
[2023-04-21 14:07:28,834 - bdfr.connector - Level 9] - Created sort filter
[2023-04-21 14:07:28,856 - bdfr.connector - Level 9] - Create file name formatter
[2023-04-21 14:07:28,856 - bdfr.connector - DEBUG] - Using unauthenticated Reddit instance
[2023-04-21 14:07:28,859 - bdfr.connector - Level 9] - Created site authenticator
[2023-04-21 14:07:28,859 - bdfr.connector - Level 9] - Retrieved subreddits
[2023-04-21 14:07:28,859 - bdfr.connector - Level 9] - Retrieved multireddits
[2023-04-21 14:07:28,859 - bdfr.connector - Level 9] - Retrieved user data
[2023-04-21 14:07:29,198 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2023-04-21 14:07:29,689 - root - ERROR] - Archiver exited unexpectedly
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\bdfr\__main__.py", line 139, in cli_archive
    reddit_archiver.download()
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\bdfr\archiver.py", line 37, in download
    if (submission.author and submission.author.name in self.args.ignore_user) or (
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\praw\models\reddit\base.py", line 34, in __getattr__
    self._fetch()
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\praw\models\reddit\comment.py", line 192, in _fetch
    raise ClientException(f"No data returned for comment {self.fullname}")
praw.exceptions.ClientException: No data returned for comment t1_12toup0
Fakeaccount12312 commented 1 year ago

I was able to reproduce the same bug with multiple different IDs and a VPN on my Android phone, too.

kvangork commented 1 year ago

The archiver's support for downloading comments without context is causing this, by forcing all 7-digit ids to be treated as comments. If you're okay archiving only full submissions, here's a workaround: https://github.com/aliparlakci/bulk-downloader-for-reddit/issues/851#issuecomment-1592057680