ad-m / python-anticaptcha

Client library for solve captchas with Anticaptcha.com support.
http://python-anticaptcha.readthedocs.io/en/latest/
MIT License
219 stars 51 forks source link

Passing Token #76

Open davidwozabal opened 4 years ago

davidwozabal commented 4 years ago

I am trying to use the library on captchas that I get with Google-Scholar when trying to get citing papers for a source. A typical URL looks like

https://scholar.google.com/scholar?cites=12685256029779217548&as_sdt=2005&sciodt=0,5&hl=en

which if fetched with python sometimes produces a captcha. The HTML code of the captcha site contains the following tags, which seem to be relevant for the use of the anticaptcha library:

<script> 
    function gs_captcha_cb(){grecaptcha.render("gs_captcha_c",{"sitekey":"6LfFDwUTAAAAAIyC8IeC3aGLqVpvrB6ZpkfmAibj","callback":function(){document.getElementById("gs_captcha_f").submit()}});};
</script>
<form method="get" id="gs_captcha_f">
    <h1>Please show you&#39;re not a robot</h1>
    <div id="gs_captcha_c"></div>
    <script src="//www.google.com/recaptcha/api.js?onload=gs_captcha_cb&render=explicit&hl=en" async defer></script>
    <input type=hidden name="hl" value="en">
    <input type=hidden name="as_sdt" value="0,5">
    <input type=hidden name="sciodt" value="0,5">
    <input type=hidden name="cites" value="12685256029779217548">
    <input type=hidden name="scipsc" value="">
</form>

I had a look at recaptcha_selenium.py. However, the above HTML code does not contain the function onSuccess() and my attempts to construct another function call such as

driver.execute_script("document.getElementById('gs_captcha_f').submit({})';".format(token))

did not yield anything.

Is there a way to deal with the situation above using the anticaptcha library?

ad-m commented 4 years ago

I can not reproduce captcha challenge. Could you verify result when you adapt callback sniffer (see https://github.com/ad-m/python-anticaptcha/blob/master/examples/recaptcha_selenium_callback.py )?

davidwozabal commented 4 years ago

I can not reproduce captcha challenge.

The problem is that the captcha only appears after several requests of the above type. Hence, it is hard to reproduce.

Could you verify result when you adapt callback sniffer (see https://github.com/ad-m/python-anticaptcha/blob/master/examples/recaptcha_selenium_callback.py )?

I am not sure how to adapt the example. If I interpret the code correctly, you are passing the token twice. The first time by setting the content of g-recaptcha-response in

driver.execute_script("document.getElementById('g-recaptcha-response').innerHTML='{}';".format(token))

and the second time by calling

driver.execute_script("grecaptcha.recaptchaCallback[0]('{}')".format(token))

The problem is that the page that I am getting has no element g-recaptcha-response and when I execute the second line I get the error

selenium.common.exceptions.JavascriptException: Message: javascript error: Cannot read property '0' of undefined

I guess the object grecaptcha is called different in my case?

If I just execute the first comment (setting the response) and then submit the form by calling

driver.execute_script("document.getElementById('gs_captcha_f').submit()';")

I get the error

selenium.common.exceptions.JavascriptException: Message: javascript error: Invalid or unexpected token

davidwozabal commented 4 years ago

I tried to get a reproducible captcha and came up with the following request

https://scholar.google.com/scholar?cites=12685256029779217548&as_sdt=2005&sciodt=0,5&hl=en&num=20

The argument num=20 produces a captcha for every call embedded in a site with a slightly different code than the captures I was facing before. However, if I could solve this, it would maybe be a start.

I tried adapting the code from recaptcha_selenium_callback.py and ended up with the following code

from selenium.webdriver.chrome.options import Options
from python_anticaptcha import AnticaptchaClient, NoCaptchaTaskProxylessTask

request = 'https://scholar.google.com/scholar?cites=12685256029779217548&as_sdt=2005&sciodt=0,5&hl=en&num=20'
options = Options()
driver = Chrome(chrome_options=options)
driver.get(request)

api_key = '...'
site_key = '6LfwuyUTAAAAAOAmoS0fdqijC2PbbdH4kjq62Y1b'
client = AnticaptchaClient(api_key)
task = NoCaptchaTaskProxylessTask(request, site_key)
job = client.createTask(task)
job.join()
token = job.get_solution_response()

driver.execute_script(
        "document.getElementById('g-recaptcha-response').innerHTML='{}';".format(token)
    )
driver.execute_script("submitCallback('{}')".format(token))
result = driver.page_source

The code runs without any errors. However, the display in the browser window does not change and also the variable result still contains the captcha page.

Where did I go wrong?

ad-m commented 4 years ago

@davidwozabal , could you provide code to reproduce captcha challenge? I do not receive the captcha challenge at the address provided. If I receive such a code - I will be able to analyze the problem more effectively.

davidwozabal commented 4 years ago

The link

https://scholar.google.com/scholar?cites=12685256029779217548&as_sdt=2005&sciodt=0,5&hl=en&num=20

above produces a captcha challenge for me (even if I open it from a normal browser from different computers).

fashan7 commented 2 years ago

please help me regarding this issue based on recaptcha

https://stackoverflow.com/questions/68877761/recaptcha-wasnt-solving-by-anticaptcha-plugin-in-selenium-python

fashan7 commented 2 years ago

Found the solution for the problem see #92