Serene-Arc / bulk-downloader-for-reddit

Downloads and archives content from reddit
https://pypi.org/project/bdfr
GNU General Public License v3.0
2.28k stars 211 forks source link

[BUG] Special characters are not filtered properly on Windows 10 #834

Open TomArrow opened 1 year ago

TomArrow commented 1 year ago

Description

When the post contains a special character, saving can fail and stop the entire archiving process. Example URL: https://old.reddit.com/r/StableDiffusion/comments/12hcxvr/i_foundnew_anime_style/

Command

bdfr clone ./stablediffusion --subreddit stablediffusion --log ./stablediffusion/log.log  --sort new

Environment (please complete the following information)

Logs

[2023-04-12 10:16:26,215 - bdfr.downloader - ERROR] - Failed to write file in submission 12hcxvr to C:\BDFR\StableDiffusion\StableDiffusion\alexio_0410_I founNew Anime style_12hcxvr.txt: [Errno 22] Invalid argument: 'C:\\BDFR\\StableDiffusion\\StableDiffusion\\alexio_0410_I found\x08New Anime style_12hcxvr.txt'
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\Scripts\bdfr.exe\__main__.py", line 7, in <module>
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\site-packages\bdfr\__main__.py", line 161, in cli_clone
    reddit_scraper.download()
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\site-packages\bdfr\cloner.py", line 27, in download
    self.write_entry(submission)
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\site-packages\bdfr\archiver.py", line 92, in write_entry
    self._write_entry_json(archive_entry)
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\site-packages\bdfr\archiver.py", line 104, in _write_entry_json
    self._write_content_to_disk(resource, content)
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\site-packages\bdfr\archiver.py", line 119, in _write_content_to_disk
    with Path(file_path).open(mode="w", encoding="utf-8") as file:
  File "C:\ProgramData\Anaconda3\lib\pathlib.py", line 1252, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "C:\ProgramData\Anaconda3\lib\pathlib.py", line 1120, in _opener
    return self._accessor.open(self, flags, mode)
OSError: [Errno 22] Invalid argument: 'C:\\BDFR\\StableDiffusion\\StableDiffusion\\alexio_0410_I found\x08New Anime style_12hcxvr.json'
TomArrow commented 1 year ago

I tried to add --exclude-id 12hcxvr to skip the offending post, but it does not work sadly.

This time it does not say "failed to write file" but gives me a similar if not identical looking stack trace:

[2023-04-13 03:15:41,514 - bdfr.archiver - INFO] - Record for entry item 12hd1tq written to disk
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\Scripts\bdfr.exe\__main__.py", line 7, in <module>
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\click\decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\site-packages\bdfr\__main__.py", line 161, in cli_clone
    reddit_scraper.download()
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\site-packages\bdfr\cloner.py", line 27, in download
    self.write_entry(submission)
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\site-packages\bdfr\archiver.py", line 92, in write_entry
    self._write_entry_json(archive_entry)
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\site-packages\bdfr\archiver.py", line 104, in _write_entry_json
    self._write_content_to_disk(resource, content)
  File "C:\Users\[Username]\AppData\Roaming\Python\Python39\site-packages\bdfr\archiver.py", line 119, in _write_content_to_disk
    with Path(file_path).open(mode="w", encoding="utf-8") as file:
  File "C:\ProgramData\Anaconda3\lib\pathlib.py", line 1252, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "C:\ProgramData\Anaconda3\lib\pathlib.py", line 1120, in _opener
    return self._accessor.open(self, flags, mode)
OSError: [Errno 22] Invalid argument: 'C:\\BDFR\\StableDiffusion\\StableDiffusion\\alexio_0410_I found\x08New Anime style_12hcxvr.json'