davidteather / TikTok-Api

The Unofficial TikTok API Wrapper In Python
https://davidteather.github.io/TikTok-Api
MIT License
4.69k stars 952 forks source link

[FEATURE_REQUEST] - Automatic Captcha Solver #347

Open davidteather opened 3 years ago

davidteather commented 3 years ago

The current solution isn't great as verifyFp probably expires after some given amount of time. We need to find a way to solve the captcha for a given verifyFp cookie.

More information

issue-label-bot[bot] commented 3 years ago

Issue-Label Bot is automatically applying the label feature_request to this issue, with a confidence of 0.99. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

MinghaoXu16 commented 3 years ago

Hi David, has the issued been solved, at least for a certain amount of time? How am I supposed to use the verifyfp? My ByHashtag is not working. Will you update a new version soon? Thank you so much for your help!

davidteather commented 3 years ago

See this

MinghaoXu16 commented 3 years ago

I used exactly the code you wrote there regarding the captcha. However, it still returned the same error as before. Did I not implement it correctly?

MinghaoXu16 commented 3 years ago

I downloaded the new version 3.7.8 and included the cookie but it still did not return me anything. I was just wondering if the issue still happening to you?

davidteather commented 3 years ago

Did you end up solving a captcha before including that cookie?

MinghaoXu16 commented 3 years ago

Yes! I did.

MinghaoXu16 commented 3 years ago

It returned the same error and I was not asked to solve another captcha so the cookie was the same.

MinghaoXu16 commented 3 years ago

May I ask how your code that worked well look like? Sorry for keep asking questions. Just really need help on this.

davidteather commented 3 years ago

The following works for me which also worked for the test suite

from TikTokApi import TikTokApi
import logging
api = TikTokApi.get_instance(logging_level=logging.INFO)
tiktoks = api.byHashtag('funny', custom_verifyFp="verify_khgp75kb_wubFGH9K_YApB_4evl_8Tw3_FVz6N3YxEhxI")
MinghaoXu16 commented 3 years ago

Oh it worked now!! Thank you so much!! That's really helpful!!

davidteather commented 3 years ago

This article might be useful

lucasoares commented 3 years ago

@davidteather I tested here using a single cookie value for valueFp.

I created an application doing 1 user post request each 5 seconds (0.2 req/s). Basically I'm crawling TikTok User Posts.

TikTok are blocking randomly. I didn't found any pattern for their blocks.

Everytime I see they blocked my application I manually enter to the tiktok in the browser with same cookie my application are using and I solve the captcha. After that my application starts collecting data again. I had cases where my application collected only for 1 minutes after solving the captcha and other that the application collected for 13 hours straight. Apparently more time pass more I was being blocked:

Green marks -> manually solved the captcha using the same cookie. image

The verifyFp itself changed for my browser so I think they have some mecanism to create new cookie after some time. I also tried to use the same verifyFp I was using in the browser (manually changed the cookie) and after that the discover page does not works for me even after the captcha resolution:

image I noticed that other API methods still works (for example user recommendation). They are deploying this on each api endpoint right now and not all endpoints has this protection or they will protect only some of TikTok endpoints.

By that I think the only solution will definitely solve this problem will be the automatic captcha solver and make the signer generate a new valid verifyFp when its necessary (of course people can generate it one time and re-use while its valid).

Don't know if any of this help. If I have some time I will try to create a PoC for captcha resolution but Its not my area of expertise.

JartanFTW commented 3 years ago

Although I haven't checked which captcha provider TikTok utilizes, this tip may still be valid. Check to see if the captcha has an audio captcha mode, which then you can easily utilize a speech recognition module to automatically solve it (speech recognition is a lot better than most people realize, even with the 'white noise'). I did this for another unrelated website that uses FunCaptcha a few weeks ago and it works smoothly. Definitely an avenue worth pursuing if it offers an audio mode.

WestKostMaven commented 3 years ago

Would it be possible to write a script to solve the captcha? Something that takes a picture of the captcha and selects a block of pixels and compares them to a bunch of same sized blocks shifted over.

davidteather commented 3 years ago

Would it be possible to write a script to solve the captcha?

Yes I linked an article above that might be helpful. I might have time over thanksgiving break to experiment with this further.

Here's some websites that might be useful but are outdated so may require changes

deus-developer commented 3 years ago

Would it be possible to write a script to solve the captcha?

Yes I linked an article above that might be helpful. I might have time over thanksgiving break to experiment with this further.

Here's some websites that might be useful but are outdated so may require changes

I have solution for solve captcha. (Get X and Y coords of puzzle). But TikTok recording mouse moves and clicks on buttons. the problem is that the payload is generated from this data. and I don't know how to emulate it. the source code of the captcha is too confusing

davidteather commented 3 years ago

I don't know how to emulate it

I think you'll need to launch a browser using pyppeteer/playwright/selenium to solve the captcha by feigning a mouse move and then grabbing that cookie after you solve it

deus-developer commented 3 years ago

Я не знаю, как это подражать

Я думаю, вам нужно запустить браузер с использованием pyppeteer / playwright / selenium, чтобы решить капчу, симулируя движение мыши, а затем схватив этот файл cookie после его решения.

The thing is, I want to make a solution without using a browser (For better performance) Even if you do this, you need to understand how the captcha logic works in order to at least run it.

davidteather commented 3 years ago

Started digging into this more and found the captcha html page, it's weird though because not every s_v_web_id cookie is valid from there :\

reccardt commented 3 years ago

When I solve the captcha from https://sf16-scmcdn-va.ibytedtos.com/goofy/secsdk-captcha/va/2.15.21/index.html and then use the s_v_web_id cookie from the sf16-scmcdn-va.ibytedtos.com domain (as opposed to the one from www.tiktok.com), calls to trending work. But when I try byUsername, getUser fails with an IndexError exception when it tries to scrape the html.

davidteather commented 3 years ago

Uhm you might wanna check out nightly branch they're not doing much prevention on the "t.tiktok.com" url with cookies so just like install that code or change BASE_URL in your code to "t.tiktok" instead of "m.tiktok" it's temporary for sure but eh scrape while you can

ntodzy commented 3 years ago

Seems like t.tiktok.com works for most of the data scraping functions, although it does not work for downloading videos.

nuqz commented 3 years ago

As I mentioned in #397 I have a working solution for solving captchas (puzzle sliders). I will be able to share the tool (Python and PyTorch based, something like this) if someone tells me exact requests limits, i.e. I need a solution for this carcabot/tiktok-signature issue#105. I did some research, I already have where to start.

rnyPlanet commented 3 years ago

https://github.com/tolgatasci/musically-tiktok-api-python maybe someone will understand this code. @davidteather @nuqz maybe help you

kilua626 commented 3 years ago

I've written a register app to register tiktok accounts, exports cookies, another login app to relogin and export cookies with auto slide captcha resolver, a signer server to sign every api endpoints that requires signature, and a comment app via sending http requests.

kilua626 commented 3 years ago

Preview for signer app (several sensitive arguments hidden here): curl --header "Content-Type: application/json" \ --request POST \ --data '{"url":"https://www.tiktok.com/comment/publish", "referer":"https://www.tiktok.com/", "ua":"ua"}' \ http://120.55.164.73:8888/api/sign

rnyPlanet commented 3 years ago

I've written a register app to register tiktok accounts, exports cookies, another login app to relogin and export cookies with auto slide captcha resolver, a signer server to sign every api endpoints that requires signature, and a comment app via sending http requests.

I wonder how you wrote the solver

kilua626 commented 3 years ago

I wrote it using opencv-python lib(pip install opencv-python)

rnyPlanet commented 3 years ago

I wrote it using opencv-python lib(pip install opencv-python)

oh cool. can you show how you did? I want to study the code how to solve captcha. i tried to implement but failed

kilua626 commented 3 years ago

Like this:https://pythonmana.com/2021/02/20210223164623700z.html

rnyPlanet commented 3 years ago

Like this:https://pythonmana.com/2021/02/20210223164623700z.html

Thank you

kilua626 commented 3 years ago

U r welcome.

Huertas97 commented 1 year ago

Hi!

I bring to the topic the solution made by @carcabot which obtain acces to the webpage by generating signatures dynamically, thus without having to resolve any captcha https://github.com/carcabot/tiktok-signature.

I have made small tests with the package and I can parse different TikTok urls automatically without having any Captcha error. It might help solving the problem for scrapping TikTok author's information.

Hope it helps!

mohamedfullstackdevloper commented 3 weeks ago

Hi!

I bring to the topic the solution made by @carcabot which obtain acces to the webpage by generating signatures dynamically, thus without having to resolve any captcha https://github.com/carcabot/tiktok-signature.

I have made small tests with the package and I can parse different TikTok urls automatically without having any Captcha error. It might help solving the problem for scrapping TikTok author's information.

Hope it helps!

can you please explain what is tiktok signature have to do along with captcha i noticd that on my chrome there is no captcha but on burp suit always there is acptcha and i do not know why any help please?