Serene-Arc / bulk-downloader-for-reddit

Downloads and archives content from reddit
https://pypi.org/project/bdfr
GNU General Public License v3.0
2.28k stars 211 forks source link

[BUG] Downloader appears to hang with tiktok url #724

Closed fergie4000 closed 1 year ago

fergie4000 commented 1 year ago

Description

Downloader appears to hang consistently with this ID (NSFW). 3+ hours the first time before I noticed but consistently every time I run it.

Command

python3 -m bdfr clone /mnt/r/reddit/ --folder-scheme {REDDITOR}/{SUBREDDIT} --link wttmgs -v -v

Environment (please complete the following information)

Logs

[2022-12-19 21:10:42,425 - bdfr.connector - DEBUG] - Setting maximum download wait time to 120 seconds
[2022-12-19 21:10:42,425 - bdfr.connector - DEBUG] - Setting datetime format string to ISO
[2022-12-19 21:10:42,427 - bdfr.connector - DEBUG] - Disabling the following modules:
[2022-12-19 21:10:42,427 - bdfr.connector - Level 9] - Created download filter
[2022-12-19 21:10:42,427 - bdfr.connector - Level 9] - Created time filter
[2022-12-19 21:10:42,427 - bdfr.connector - Level 9] - Created sort filter
[2022-12-19 21:10:42,427 - bdfr.connector - Level 9] - Create file name formatter
[2022-12-19 21:10:42,427 - bdfr.connector - DEBUG] - Using unauthenticated Reddit instance
[2022-12-19 21:10:42,429 - bdfr.connector - Level 9] - Created site authenticator
[2022-12-19 21:10:42,429 - bdfr.connector - Level 9] - Retrieved subreddits
[2022-12-19 21:10:42,429 - bdfr.connector - Level 9] - Retrieved multireddits
[2022-12-19 21:10:42,429 - bdfr.connector - Level 9] - Retrieved user data
[2022-12-19 21:10:42,430 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2022-12-19 21:10:43,067 - bdfr.downloader - DEBUG] - Attempting to download submission wttmgs
[2022-12-19 21:10:43,068 - bdfr.downloader - DEBUG] - Using Direct with url https://www.tiktok.com/@keriberry.420?_t=8V0q4wrW0Gw&_r=1
OMEGARAZER commented 1 year ago

Just as an FYI it looks like that's a link to a full profile rather than a specific video so unsure how would be best to handle something like that.

But the main part is it seems to get picked up by the direct downloader because of the .420 in the username. I think I know what will fix it but will need to test it.

Serene-Arc commented 1 year ago

Yeah there's no way to download that but it definitely shouldn't hang

fergie4000 commented 1 year ago

Doesn't appear to be fixed on my end.

(venv) user@DESKTOP:~$python3 -m pip install git+https://github.com/aliparlakci/bulk-downloader-for-reddit.git@development
Collecting git+https://github.com/aliparlakci/bulk-downloader-for-reddit.git@development
  Cloning https://github.com/aliparlakci/bulk-downloader-for-reddit.git (to revision development) to /tmp/pip-req-build-mrh_7dq2
  Running command git clone --filter=blob:none --quiet https://github.com/aliparlakci/bulk-downloader-for-reddit.git /tmp/pip-req-build-mrh_7dq2
  Resolved https://github.com/aliparlakci/bulk-downloader-for-reddit.git to commit c63a8842d9ab5fd474645c62593c4460837a7f15
  Running command git submodule update --init --recursive -q
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: yt-dlp>=2022.11.11 in ./python/reddit/venv/lib/python3.10/site-packages (from bdfr==2.6.2) (2022.11.11)
Requirement already satisfied: pyyaml>=5.4.1 in ./python/reddit/venv/lib/python3.10/site-packages (from bdfr==2.6.2) (6.0)
Requirement already satisfied: appdirs>=1.4.4 in ./python/reddit/venv/lib/python3.10/site-packages (from bdfr==2.6.2) (1.4.4)
Requirement already satisfied: requests>=2.25.1 in ./python/reddit/venv/lib/python3.10/site-packages (from bdfr==2.6.2) (2.27.1)
Requirement already satisfied: dict2xml>=1.7.0 in ./python/reddit/venv/lib/python3.10/site-packages (from bdfr==2.6.2) (1.7.0)
Requirement already satisfied: praw>=7.2.0 in ./python/reddit/venv/lib/python3.10/site-packages (from bdfr==2.6.2) (7.6.1)
Requirement already satisfied: beautifulsoup4>=4.10.0 in ./python/reddit/venv/lib/python3.10/site-packages (from bdfr==2.6.2) (4.10.0)
Requirement already satisfied: click>=8.0.0 in ./python/reddit/venv/lib/python3.10/site-packages (from bdfr==2.6.2) (8.0.3)
Requirement already satisfied: soupsieve>1.2 in ./python/reddit/venv/lib/python3.10/site-packages (from beautifulsoup4>=4.10.0->bdfr==2.6.2) (2.3.1)
Requirement already satisfied: prawcore<3,>=2.1 in ./python/reddit/venv/lib/python3.10/site-packages (from praw>=7.2.0->bdfr==2.6.2) (2.3.0)
Requirement already satisfied: websocket-client>=0.54.0 in ./python/reddit/venv/lib/python3.10/site-packages (from praw>=7.2.0->bdfr==2.6.2) (1.2.3)
Requirement already satisfied: update-checker>=0.18 in ./python/reddit/venv/lib/python3.10/site-packages (from praw>=7.2.0->bdfr==2.6.2) (0.18.0)
Requirement already satisfied: idna<4,>=2.5 in ./python/reddit/venv/lib/python3.10/site-packages (from requests>=2.25.1->bdfr==2.6.2) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in ./python/reddit/venv/lib/python3.10/site-packages (from requests>=2.25.1->bdfr==2.6.2) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in ./python/reddit/venv/lib/python3.10/site-packages (from requests>=2.25.1->bdfr==2.6.2) (2.0.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./python/reddit/venv/lib/python3.10/site-packages (from requests>=2.25.1->bdfr==2.6.2) (1.26.8)
Requirement already satisfied: mutagen in ./python/reddit/venv/lib/python3.10/site-packages (from yt-dlp>=2022.11.11->bdfr==2.6.2) (1.45.1)
Requirement already satisfied: pycryptodomex in ./python/reddit/venv/lib/python3.10/site-packages (from yt-dlp>=2022.11.11->bdfr==2.6.2) (3.13.0)
Requirement already satisfied: brotli in ./python/reddit/venv/lib/python3.10/site-packages (from yt-dlp>=2022.11.11->bdfr==2.6.2) (1.0.9)
Requirement already satisfied: websockets in ./python/reddit/venv/lib/python3.10/site-packages (from yt-dlp>=2022.11.11->bdfr==2.6.2) (10.1)
(venv) user@DESKTOP:~$
(venv) user@DESKTOP:~$python3 -m bdfr clone /mnt/r/reddit/ --folder-scheme {REDDITOR}/{SUBREDDIT} --link wttmgs -v -v
[2022-12-22 21:06:49,258 - bdfr.connector - DEBUG] - Loading configuration from /home/user/.config/bdfr/default_config.cfg
[2022-12-22 21:06:49,260 - bdfr.connector - DEBUG] - Setting maximum download wait time to 120 seconds
[2022-12-22 21:06:49,260 - bdfr.connector - DEBUG] - Setting datetime format string to ISO
[2022-12-22 21:06:49,263 - bdfr.connector - DEBUG] - Disabling the following modules:
[2022-12-22 21:06:49,263 - bdfr.connector - Level 9] - Created download filter
[2022-12-22 21:06:49,263 - bdfr.connector - Level 9] - Created time filter
[2022-12-22 21:06:49,264 - bdfr.connector - Level 9] - Created sort filter
[2022-12-22 21:06:49,264 - bdfr.connector - Level 9] - Create file name formatter
[2022-12-22 21:06:49,265 - bdfr.connector - DEBUG] - Using unauthenticated Reddit instance
[2022-12-22 21:06:49,266 - bdfr.connector - Level 9] - Created site authenticator
[2022-12-22 21:06:49,266 - bdfr.connector - Level 9] - Retrieved subreddits
[2022-12-22 21:06:49,266 - bdfr.connector - Level 9] - Retrieved multireddits
[2022-12-22 21:06:49,267 - bdfr.connector - Level 9] - Retrieved user data
[2022-12-22 21:06:49,267 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2022-12-22 21:06:50,330 - bdfr.downloader - DEBUG] - Attempting to download submission wttmgs
[2022-12-22 21:06:50,331 - bdfr.downloader - DEBUG] - Using Direct with url https://www.tiktok.com/@keriberry.420?_t=8V0q4wrW0Gw&_r=1
^C
Aborted!
OMEGARAZER commented 1 year ago
$  bdfr clone test/ --folder-scheme {REDDITOR}/{SUBREDDIT} --link wttmgs -vvv
[2022-12-22 20:48:16,941 - bdfr.connector - DEBUG] - Loading configuration from /home/omegarazer/.config/bdfr/config.cfg
[2022-12-22 20:48:16,941 - bdfr.connector - DEBUG] - Setting maximum download wait time to 120 seconds
[2022-12-22 20:48:16,941 - bdfr.connector - DEBUG] - Setting datetime format string to ISO
[2022-12-22 20:48:16,942 - bdfr.connector - DEBUG] - Disabling the following modules:
[2022-12-22 20:48:16,942 - bdfr.connector - Level 9] - Created download filter
[2022-12-22 20:48:16,942 - bdfr.connector - Level 9] - Created time filter
[2022-12-22 20:48:16,942 - bdfr.connector - Level 9] - Created sort filter
[2022-12-22 20:48:16,942 - bdfr.connector - Level 9] - Create file name formatter
[2022-12-22 20:48:16,944 - bdfr.connector - DEBUG] - Using unauthenticated Reddit instance
[2022-12-22 20:48:16,945 - bdfr.connector - Level 9] - Created site authenticator
[2022-12-22 20:48:16,945 - bdfr.connector - Level 9] - Retrieved subreddits
[2022-12-22 20:48:16,945 - bdfr.connector - Level 9] - Retrieved multireddits
[2022-12-22 20:48:16,945 - bdfr.connector - Level 9] - Retrieved user data
[2022-12-22 20:48:16,945 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2022-12-22 20:48:17,319 - bdfr.downloader - DEBUG] - Attempting to download submission wttmgs
[2022-12-22 20:48:38,051 - bdfr.downloader - ERROR] - Could not download submission wttmgs: No downloader module exists for url https://www.tiktok.com/@keriberry.420?_t=8V0q4wrW0Gw&_r=1
[2022-12-22 20:48:38,051 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission wttmgs
[2022-12-22 20:48:38,132 - bdfr.archiver - DEBUG] - Writing entry wttmgs to file in JSON format at /home/omegarazer/Reddit/keriberry_420/keriberry_420/keriberry_420_At 1k I can go live 🖤_wttmgs.json
[2022-12-22 20:48:38,133 - bdfr.archiver - INFO] - Record for entry item wttmgs written to disk
[2022-12-22 20:48:38,133 - root - INFO] - Program complete

Double check the download_factory you have has the updates? it appears to be working as expected for me. Maybe a bytecompiled cache of the old version?

fergie4000 commented 1 year ago

Maybe a bytecompiled cache of the old version?

That's seems to be what it was. download_factory had the updates. Cleared __pycache__ and ran it again and it works fine. Sorry about that.