drawrowfly / tiktok-scraper

TikTok Scraper. Download video posts, collect user/trend/hashtag/music feed metadata, sign URL and etc.
4.45k stars 805 forks source link

Scraping users from batch-file, empty videos are occassionally downloaded #220

Closed blasphemite closed 4 years ago

blasphemite commented 4 years ago

Describe the bug When scraping a list of users, videos are occasionally being saved with no data (0 bytes). I've done testing to make sure it isn't my proxy or any other settings, and they seem to have no effect. I installed via npm on a fresh VM and the same issue occurs.

Example: Running the following commands, interchangeably tiktok-scraper from-file list.txt 5 -n 5 -d -hd -w -p socks5://127.0.0.1:5667 --filepath ~/output

tiktok-scraper from-file list.txt 5 -n 5 -d -hd -w --filepath ~/output

list.txt is a modified list of public, highly-popular usernames.

Results in the entire list of users being scraped successfully, except for different random files being 0 bytes. Running the same command every few minutes, the affected files are different every time. It's always 1-4 files affected with a list of 50 users (250 videos).

The files that are affected are overwritten by successfully-downloaded versions if I run the command again, and other random files that were previously downloaded are overwritten with 0-byte versions.

Other videos that are being downloaded at the same time via async tasks are created just fine. If it were an issue with my proxy or network I figure I would see some pattern or failures with the other downloads that are happening at the same time.

Sorry if I repeated myself. Trying to be thorough...

To Reproduce Steps to reproduce the behavior:

Screenshots If applicable, add screenshots to help explain your problem.

rebelanomaly commented 4 years ago

This is happening for me every time.

drawrowfly commented 4 years ago

Can't reproduce

Scraper Version: 1.2.5 list.txt content:

adamw
addisonre
alexwaarren
antonielokhorst
anygabriellyofficial
arii
ashleynewman
avani
avarxseee
brandonrowland
daisykeech
destormpower
gymshark
hunterrowland
imgriffinjohnson
jacksonfelt
jadenhossler
jameswrightt
jeremyhutchins
jiffpom
joeybirlem
joshrichards
k0uvr
kiocyrrr
larrayeeee
luvanthony
madi
marinaleigh
mattsteffanina
montyjlopez
nessaabarrett
nickaustinn
nowunited
official_janina
officialsaarx
petroutv
qgriggs
rhia.official
rylandstormss
samhurley
sarati
sherinicolee
thehypehouse
theswayla
tiktok
tiktok_australia
tiktok_kr
tiktokglobal
zachclayton
zachking

Command: tiktok-scraper from-file list.txt 5 -n 5 -d -hd -w --filepath /path

My guess is that is related to the IP/Proxy if you scrape a lot then you probably getting "rate limited" time to time, or connections is breaking off as this is almost a heavy task (batch downloading)

Also there is nothing in the code that can cause such behavior

blasphemite commented 4 years ago

You must be right about rate limiting.

What does the scraper typically do what it encounters rate-limiting? In that past I experienced a bit because of how frequently I was scraping, but it never resulted in blank files being created, just the scraper completing without downloading anything at all.

drawrowfly commented 4 years ago

Empty content(TikTok Side) = Empty file

blasphemite commented 4 years ago

Thank you