Serene-Arc / bulk-downloader-for-reddit

Downloads and archives content from reddit
https://pypi.org/project/bdfr
GNU General Public License v3.0
2.3k stars 211 forks source link

[BUG] DuplicateReplaceException: A duplicate comment has been detected. #901

Open devonandchris opened 1 year ago

devonandchris commented 1 year ago

Description

When using --all-comments I get

praw.exceptions.DuplicateReplaceException: A duplicate comment has been detected. Are you attempting to call 'replace_more_comments' more than once?

after a few comments are downloaded.

Command

 bdfr archive DIRECTORY --user me  --submitted --all-comments --authenticate --file-scheme "{REDDITOR}_{POSTID}_{DATE}"

Environment (please complete the following information)

Logs

on console


    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/bdfrx/__main__.py", line 117, in cli_download
    reddit_downloader = RedditDownloader(config, [stream])
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/bdfrx/downloader.py", line 40, in __init__
    super().__init__(args, logging_handlers)
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/bdfrx/connector.py", line 63, in __init__
    self._setup_internal_objects()
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/bdfrx/connector.py", line 80, in _setup_internal_objects
    self.create_reddit_instance()
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/bdfrx/connector.py", line 156, in create_reddit_instance
    token = oauth2_authenticator.retrieve_new_token()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/bdfrx/oauth2.py", line 73, in retrieve_new_token
    refresh_token = reddit.auth.authorize(params["code"])
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/praw/models/auth.py", line 54, in authorize
    authorizer.authorize(code)
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/prawcore/auth.py", line 242, in authorize
    self._request_token(
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/prawcore/auth.py", line 155, in _request_token
    response = self._authenticator._post(url, **data)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/devon/.local/pipx/venvs/bdfrx/lib/python3.11/site-packages/prawcore/auth.py", line 38, in _post
    raise ResponseException(response)
prawcore.exceptions.ResponseException: received 401 HTTP response

in log file:

[2023-06-22 16:58:43,763 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2023-06-22 16:58:43,763 - bdfr.connector - Level 9] - Created download filter
[2023-06-22 16:58:43,763 - bdfr.connector - Level 9] - Created time filter
[2023-06-22 16:58:43,763 - bdfr.connector - Level 9] - Created sort filter
[2023-06-22 16:58:43,768 - bdfr.connector - Level 9] - Create file name formatter
[2023-06-22 16:58:43,768 - bdfr.connector - DEBUG] - Using authenticated Reddit instance
[2023-06-22 16:58:43,963 - bdfr.oauth2 - Level 9] - Loaded OAuth2 token for authoriser
[2023-06-22 16:58:44,138 - bdfr.oauth2 - Level 9] - Written OAuth2 token from authoriser to /Users/devon/Library/Application Support/bdfr/default_config.cfg
[2023-06-22 16:58:44,354 - bdfr.connector - Level 9] - Resolved user to DevonAndChris
[2023-06-22 16:58:44,354 - bdfr.connector - Level 9] - Created site authenticator
[2023-06-22 16:58:44,354 - bdfr.connector - Level 9] - Retrieved subreddits
[2023-06-22 16:58:44,354 - bdfr.connector - Level 9] - Retrieved multireddits
[2023-06-22 16:58:44,354 - bdfr.archiver - DEBUG] - Retrieving comments of user DevonAndChris
[2023-06-22 16:58:44,355 - bdfr.connector - Level 9] - Retrieved user data
[2023-06-22 16:58:44,355 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2023-06-22 16:58:46,095 - bdfr.archiver - DEBUG] - Attempting to archive submission jp1rmrr
[2023-06-22 16:58:52,610 - bdfr.archiver - DEBUG] - Writing entry jp1rmrr to file in JSON format at /Users/devon/Documents/archives/bdfr-auth9/BlockedAndReported/DevonAndChris_jp1rmrr_2023-06-21T23:25:13.json
[2023-06-22 16:58:52,610 - bdfr.archiver - INFO] - Record for entry item jp1rmrr written to disk
[2023-06-22 16:58:52,610 - bdfr.archiver - DEBUG] - Attempting to archive submission jp1t56r
[2023-06-22 16:58:58,440 - bdfr.archiver - DEBUG] - Writing entry jp1t56r to file in JSON format at /Users/devon/Documents/archives/bdfr-auth9/BlockedAndReported/DevonAndChris_jp1t56r_2023-06-21T23:39:07.json
[2023-06-22 16:58:58,440 - bdfr.archiver - INFO] - Record for entry item jp1t56r written to disk
[2023-06-22 16:58:58,440 - bdfr.archiver - DEBUG] - Attempting to archive submission jozxusg
[2023-06-22 16:59:01,372 - root - ERROR] - Archiver exited unexpectedly
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/bdfr/__main__.py", line 139, in cli_archive
    reddit_archiver.download()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/bdfr/archiver.py", line 49, in download
    self.write_entry(submission)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/bdfr/archiver.py", line 92, in write_entry
    self._write_entry_json(archive_entry)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/bdfr/archiver.py", line 103, in _write_entry_json
    content = json.dumps(entry.compile())
                         ^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/bdfr/archive_entry/comment_archive_entry.py", line 19, in compile
    self.post_details = self._convert_comment_to_dict(self.source)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/bdfr/archive_entry/base_archive_entry.py", line 36, in _convert_comment_to_dict
    in_comment.replies.replace_more(limit=None)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/praw/util/deprecate_args.py", line 43, in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/praw/models/comment_forest.py", line 195, in replace_more
    self._insert_comment(comment)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/praw/models/comment_forest.py", line 80, in _insert_comment
    raise DuplicateReplaceException
praw.exceptions.DuplicateReplaceException: A duplicate comment has been detected. Are you attempting to call 'replace_more_comments' more than once?
klueman commented 1 year ago

Same issue, though I guess with only 7 hours left, it's pointless to hope for a fix.

Serene-Arc commented 1 year ago

There will be fixes, the BDFR will be maintained going forwards. We're not stopping.

Serene-Arc commented 1 year ago

Do you have any other submission IDs for which this error occurs? The one in the logs does not exist.

xenon-difluoride commented 1 year ago

I'm having the same issue, and I found another ID that causes the issue, but it's not a submission, its a comment.

The ID causing the issue for me is jns7s0a, aka this comment: https://www.reddit.com/r/OutOfTheLoop/comments/146m5y0/whats_the_deal_with_so_many_people_mourning_the/jns7s0a/ and after trimming down the command i was using, i got to this, which should be usable to reproduce it: bdfr clone --link jns7s0a [directory] --file-scheme '{POSTID}'