Closed emacollins closed 1 year ago
What version of the library are you using? Also can you share the snippet of code you're using that doesn't seem to be working in Docker?
Hi, I am no longer going the Docker route. But I did run into the same problem (only 30 videos scraped even with a scroll time set), but on my local computer. It was working fine but now seems like no matter how high the scroll_time value is set to, it only get the first 30 videos. I am doing the json dump, and usually the extra videos are in the "extras" field. That is now blank.
I am using version 0.1.11.
I had usually been using scroll times between 10 and 300 sec, and it always seemed to return the extras pages with the full list of videos. Now it is not? Hmmm.
with TikTokAPI(scroll_down_time=scroll_time,navigation_retries=5, navigation_timeout=0,
data_dump_file=filename) as api:
try:
user_object = api.user(user, video_limit=0)
except:
pass
I have the same issue currently ; can't seem to load more than 30 videos no matter how I setup scroll_down_time
, also using version 0.1.11.
Previously, this problem arose due to what seemed to be a bug in Playwright. The fix at that time was to switch the web driver to Firefox, but if you're both having issues, it might mean the issue is presenting itself in Firefox now. I don't have a whole lot of time to address this, being a full time masters student, but I'll try to take a look soon.
Hi! Have the same issue. Tried:
all these combinations. Nothing works out of the box :(
What can I use for scraping user video stats? Used LightVideo from user model. Can I get it the other way?
What can I use for scraping user video stats? Used LightVideo from user model. Can I get it the other way?
If you have a User object and want to grab data on that user's videos, use the user.videos
iterator. Iterating through this will load each video on demand, getting accurate statistics, video info, etc. If all you need are loose statistics, the LightVideos are faster.
@emacollins @CarlCochet @vladisalv Please try again with version 0.1.12. I've added new parameters to the API constructors that you can try messing with:
scroll_down_delay
sets the time (in seconds) before scrolling down is started. This is useful if your network is slow (e.g.: you're running TikTokPy in a Docker container)scroll_down_iter_delay
sets the time (in seconds) between scrolls. This can also be useful to tinker with if your network is slow.I also suggest updating all dependencies.
Use:
scroll_down_delay
now defaults to 1 second instead of an implicit 0 seconds. If this does not immediately fix your problems, my suggestions are as follows:
scroll_down_iter_delay
to 0.5 from the default 0.2. This will slow down the scrolling, which could help load the msToken cookie (see Explanation)scroll_down_delay
to 3. This should also help load the msToken cookie.Explanation:
Notably, TikTok provides browsers with an msToken cookie, and scrolling down doesn't work until this cookie is provided. If you scroll down too fast, you'll deadlock TikTok. Scrolling down further won't make any more API calls. The only way for this deadlock to be removed is to scroll back up and then back down. TikTokPy scrolls up a bit every other scroll-down, but if the iterative scroll-downs happen too fast, the deadlock might not let up. These two new parameters can alleviate these issues.
Hi @Russell-Newton ! I checked it not in Docker with good internet speed, but it doesn't work. Scraped only 30 videos from 300.
I looked at code, you use evaluate. Maybe use mouse wheel?
What values for scroll_down_time
, scroll_down_delay
, and scroll_down_iter_delay
of you have set @vladisalv?
As you suggested above I started with:
I increased it step by step and finished with these values:
But has only 30 videos from more than 300.
As I understand, it scrolled down videos. Because by default I got just 27 videos. So, it scrolls page, but stopped at first iteration pagination.
@vladisalv Please try again on version 0.1.13, if you aren't already using it. I made some changes that should hopefully fix an issue with collecting extra videos.
@Russell-Newton still doesn't work
for clarifying how I use code:
with TikTokAPI(scroll_down_time=20, scroll_down_delay=5, scroll_down_iter_delay=5) as api:
user_stat = api.user(self.username, video_limit=1)
video_count = user_stat.stats.video_count
videos = []
scroll_time = 20
while True:
print("Scroll time:", scroll_time)
user = api.user(self.username,
scroll_down_time=scroll_time,
scroll_down_delay=5,
scroll_down_iter_delay=5,
)
scroll_time *= 2
videos.clear()
for v in user.videos.light_models:
videos.append(v)
print("len of videos:", len(videos))
print("we are waiting for", video_count)
if len(videos) == video_count:
break
Output:
Scroll time: 20
len of videos: 30
we are waiting for 302
...
Scroll time: 160
len of videos: 30
we are waiting for 302
Also, I got with new 0.1.13 version this exception:
File ".../.venv/lib/python3.10/site-packages/playwright/_impl/
_connection.py", line 96, in inner_send
result = next(iter(done)).result()
playwright._impl._api_types.Error: Protocol error (Network.getResponseBody): No data found for resource with given identifier
@vladisalv could you try again on your system with the 38-post-list-scroll-failure
branch's code? Just to help with my debugging.
pip install -U https://github.com/Russell-Newton/TikTokPy.git@38-post-list-scroll-failure
And then you can try something simple like:
with TikTokAPI(scroll_down_time=120) as api:
api.user("tiktok")
If my hunch is correct, the message Something went wrong
should get printed out if you only collect 30 or so videos. If this is the case, that'll give me some more information about what's going wrong so that I may be able to fix it. My hunch is that it's related to this Reddit post: https://www.reddit.com/r/Tiktokhelp/comments/wybfcg/something_went_wrong_error_on_tiktok_web_via/.
Looking at the network logs, it seems like the API requests that attempt to grab the user posts sometimes return with a completely empty body. I'm able to recreate this locally, but it's inconsistent. I suspect I may have to do an overhaul like I suggest in #21 in order to completely fix this issue.
I think the changes I've been working on with v0.2 might fix this issue. It could be worth checking out:
pip install -U git+https://github.com/Russell-Newton/TikTokPy.git@v0.2-overhaul
I removed the scrolling parameters, but it should (fingers crossed) work without any API constructor parameters. You should be able to get away with:
with TikTokAPI() as api:
user = api.user("tiktok")
for video in user.videos:
# do something
This should iterate over all of a user's videos. You can limit this using the video_limit
parameter in api.user
or using the limit
method attached to user.videos
(for video in user.videos.limit(30)
).
@emacollins @CarlCochet @vladisalv If one or all of you could try with the WIP changes, that would be very helpful. It works for me, but it's worth verifying that it works for you.
Ask your question I tried containerizing my script with this package in Docker (Dockerfile below). When it runs, I am able to get user information back, but it seems that the scroll time is not taken into account? When I set a high scroll time running on my host locally, it returns all of a users videos, even if they have a lot. When running the same code on my container, it only returns a fraction of the data (first 30 videos). I am using the data_dump_file (I can see the file size is much smaller on the data file when running through Docker) Any ideas?