Closed luizkc closed 5 years ago
Hi @luizkc,
I'm sure we can figure this out. Would you mind sharing the debug output?
Redirect the stdout and stderr to a file or xclip and pastebin it:
node index.js > out.txt 2>&1
node index.js |& xclip -i -sel clipboard
The weirdest part about this is that my code works using the Python version of the module recreating the exact same requests.
Which python module? The same name python module is of plagiarism, license issues, has known vulnerabilities, spits in the face of FOSS, and is not to be trusted. It's a rip-off of the original cfscrape. Use cfscrape instead, it's maintained.
The challenge solving code in all of these libraries was written by me (including the plagiarized one) and they all generally work the same way if you're using equivalent options. The only exception being the redirect behavior. The python modules handle redirects in a non-standard way by always reusing the original request method instead of switching over to the GET
method.
A very similar and recently solved issue: https://github.com/codemanki/cloudscraper/issues/255
When sending the post request, the console sometimes prints:
request received invalid json
when debug is on.
When the json
option is used, the request library's onRequestResponse
handler attempts to parse the response as JSON. If you get e.g. HTML instead, it will intentionally fail silently unless, as you mentioned, debugging is enabled. The user should validate the response.body
anyway since valid JSON once parsed could be a number, string, empty string, boolean, null, object, or an array. If you're expecting an array e.g.
const valid = Array.isArray(response.body) && response.body.length > 0;
.
If you're trying to post JSON, try the json
option instead of formData
:
const gen = await cloudscraper.post({
url: "https://nakedcph.com/auth/submit",
resolveWithFullResponse: true,
followOriginalHttpMethod: true,
simple: false,
headers: headers,
json: {
_AntiCsrfToken: Csrf,
firstName: "CLOUDSCRAPER TEST",
email: "cloudscraper+123456@gmail.com",
password: "MyPassword123",
"g-recaptcha-response": String(captchaRes),
action: "register"
}
})
Cheers
Hi @pro-src. Thanks for the quick reply.
I tried sending the request as you said and it still did not work.
I used this module in Python, is this the one that is dangerous to use?
I have read issue 255 and tried everything in there, but in this case, it still didn't resolve the issue for me. I believe issue 255 had a similar problem but not the same, although it does try sending a GET
when we are specifying POST
at some point during the redirects. In other words, the 2nd request, which is a POST
, get's redirected and GET
requests get sent to the same endpoint. Is this where the issue lies?
Here is my out.txt file as you requested! I hope I'm doing something really stupid and that the solution is simple. I do apologize in advance if that is the case.
Thanks again for the help!
Edit: it was sending a GET
request actually. Just read the out.txt again.
Thanks again for the help!
Yw :smile:
I used this module in Python, is this the one that is dangerous to use?
Yes:exclamation: I used to own that pypi.org project. Unfortunately, it is owned by a very cunning individual now. You've been warned.
I've noticed that you're attempting to send pseudo HTTP/2 headers.
The underlying request library doesn't support HTTP/2 and adds the host
header automatically. The same applies to python's requests
library. The :authority
and host
headers are mutually exclusive. The :authority
header should not be sent when using HTTP/1. The host
header should not be sent when using HTTP/2.
All of the above headers should be omitted when imitating a browser's HTTP/1 request.
I'm using https://httpbin.org which responds with the request info that you sent to demonstrate the difference between the json
and formData
options:
Perform the same tests with your python code to ensure that everything lines up. If I had the debug output from python maybe I could pinpoint the issue.
Prepend the following to your python code to generate similar debug output:
With fresh eyes, the server expects the body to be application/x-www-form-urlencoded
: https://github.com/request/request#forms
const cloudscraper = require('cloudscraper')
const { headers: defaultHeaders } = cloudscraper.defaultParams
const uri = new URL('https://www.nakedcph.com/auth/view?op=register')
const response = await cloudscraper.post({
uri: new URL('/auth/submit', uri.href),
resolveWithFullResponse: true,
followOriginalHttpMethod: true,
json: true,
simple: false,
headers: {
...defaultHeaders,
Origin: uri.origin,
Referer: uri.href,
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'X-AntiCsrfToken': csrf,
'X-Requested-With': 'XMLHttpRequest'
},
form: {
_AntiCsrfToken: csrf,
firstName: username,
email: username + '@gmail.com',
password: password,
'g-recaptcha-response': gRes,
action: 'register'
}
})
@pro-src I love you. Thanks.
Following the above changes you made to the request, I was able to get a 500 response instead of a 403. The 500 response included a captchaError
. So instead of letting Cloudscraper set the sitekey and URL to solve the captcha for, I simply hard-set them in my captcha solving function, sent the post request, and the 500 then became a 200, and now, everything works perfectly.
Takeaways:
nakedcph.com
as opposed to having a URL with a declared protocol and path.Hope this helps anyone else having this issue and thank you so much @pro-src for all of the help.
My issue is resolved ๐
There hasn't been a whole lot of feedback concerning the reCaptcha related API.
Per your feedback, I've sent a PR(#260) to soft deprecate captcha.url
in preference of captcha.uri
which is an instance of the builtin URL class. This new property conveniently allows for captcha.uri.origin
, captcha.uri.host
, captcha.uri.hostname
, etc. to be used in place of captcha.url
. The old property would still be available with deprecation warnings for sometime and is equivalent to captcha.uri.href
aka response.request.uri.href
.
Secondly, I fixed a few bugs:
Finally, the regular expressions have been greatly improved. The siteKey can be found 4 times within Cloudflare's reCaptcha(v2) response, e.g. https://captcha.website, and this update is aware of all them. Previously, this would only match 2, the data-sitekey attribute and the fallback.
Could I get you to test this and report back whether pinning the URL and/or siteKey is still necessary?
npm install "git://github.com/pro-src/cloudscraper.git#recaptcha"
OR
git clone --single-branch --branch recaptcha https://github.com/pro-src/cloudscraper
cd cloudscraper
# Feel free to replace npm with yarn in any of these commands
npm install # Optionally add --production, if skipping test
npm test # Optional but recommended
# If you're going to manually update your require calls, you're done
# Otherwise register cloudscraper with NPM globally
npm link
# Proceed to create a symlink to cloudscraper in your project's node_modules/
cd ../my-project
npm link cloudscraper
node index.js
sorry for missing this! I was about to post another issue when I saw this. Will be performing these tests today and reporting back.
@pro-src Did some more testing. I'm able to bypass the initial Cloudflare captcha page here. However, when sending the post request we revised above, I keep getting captchaError
now or some unexpected responses from the server instead of a successful account creation, like in my Python script.
Any ideas on the fix? This is my code:
var cloudscraper = require("cloudscraper")
const captchaAPI = require("imagetyperz-api")
const { headers: defaultHeaders } = cloudscraper.defaultParams
//cloudscraper.debug = true
let captchaRes
async function onCaptcha(options, response, body) {
const captcha = response.captcha
// solveReCAPTCHA is a method that you should come up with and pass it href and sitekey, in return it will return you a reponse
const token = await solveCaptcha(response.request.uri.href, captcha.siteKey)
captcha.form["g-recaptcha-response"] = token
captcha.submit()
}
// python sitekey = '6LeNqBUUAAAAAFbhC-CS22rwzkZjr_g4vMmqD_qo'
async function solveCaptcha(uri, sitekey) {
captchaAPI.set_access_key("MY CAPTCHA SOLVING API KEY")
const params = {
page_url: uri,
sitekey: sitekey
}
console.log("Solving captcha...")
const id = await captchaAPI.submit_recaptcha(params)
const token = await captchaAPI.retrieve_recaptcha(id)
captchaRes = token
return token
}
async function run() {
const req = await cloudscraper.get({
uri: "https://nakedcph.com",
onCaptcha: onCaptcha,
resolveWithFullResponse: true,
simple: false,
headers: {
"user-agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36",
Accept: "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br"
}
})
if (req.statusCode === 200) {
const AntiCsrfToken = req.body.match(
/setRequestHeader\('X-AntiCsrfToken', '(.+)'/
)[1]
const post = await cloudscraper.post({
uri: "https://nakedcph.com/auth/submit",
onCaptcha: onCaptcha,
resolveWithFullResponse: true,
followOriginalHttpMethod: true,
json: true,
simple: false,
headers: {
...defaultHeaders,
"user-agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36",
Accept: "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"x-AntiCsrfToken": AntiCsrfToken,
"x-Requested-With": "XMLHttpRequest",
origin: "https://www.nakedcph.com",
referer: "https://www.nakedcph.com/auth/view?op=register"
},
form: {
_AntiCsrfToken: AntiCsrfToken,
firstName: "My Name",
email: "myemail@gmail.com",
password: "MyPassword",
"g-recaptcha-response": captchaRes,
action: "register"
}
})
console.log("ACC RESULT:")
console.log(post.statusCode)
console.log(post.body)
return
}
}
run()
Running the following code with debug
on generates the attached logs.
Something to note: The sitekey I use to solve captchas in Python is different from the one scraped by Cloudscraper in the example above. Very weird that the sitekey commented out works in the Python version but yields captchaError
s in the node js
version.
Thanks for helping out, hopefully we can get this working and the module bugs sorted out ASAP!
@pro-src any updates?
Not as of yet.
was the issue resolved? Just wondering why this was closed with no reply.
I'm still having issues with the example I sent on the updated version! Any help is appreciated. Thanks.
@luizkc
was the issue resolved? Just wondering why this was closed with no reply.
You said:
My issue is resolved :blush:
The issue was reopened to address the bugs that I discovered and subsequently closed automatically by Github once the (fix)PR was merged.
I understand there's a new issue that's related to the old one but would you mind opening a new issue for that and just referencing this one. I have meant to attend your issue.
Very weird that the sitekey commented out works in the Python version but yields captchaErrors in the node js version.
That's very weird indeed considering how the python regex is merely:
'data-sitekey="(.+?)"'
Where as Cloudscraper's primary siteKey regex is robust (not mentioning the fallbacks):
/\sdata-sitekey=["']?([^\s"'<>&]+)/
So if anything, the python code would be failing you. I would just create a simple (working) snippet to show the difference if there was one but I don't see it. Feel free to prove me wrong.
Somebody will eventually get around to your issue. If you would like me personally to expedite your issue, consider becoming a patron.
Thanks for your understanding.
@pro-src i do not mind becoming a patron to get assisted ASAP!
Just tell me how to go about doing that and how much I should pledge to be worth your time :)
Would love to work on my issue specifically with you if possible. If becoming a patron is what it takes to get some 1-on-1 assistance from you I will definitely do it.
@luizkc There's a couple tiers at this link and there's an option to make a custom pledge. :)
@pro-src awesome! Just pledged. See you in Discord! Can't wait :)
Sorry to comment on a closed issue, but was there a resolution to this? I am also experiencing the 500 issue @pro-src
Hi!
I'm using Cloudscraper version
4.14
on Node version12.10.0
.I'm attempting to access this website, which has a cloudflare protection page with a captcha.
I can bypass the cloudflare and access the site's homepage/any page, however, after bypassing, I am unable to successfully send a post request. The weirdest part about this is that my code works using the Python version of the module recreating the exact same requests.
When sending the post request, the console sometimes prints:
request received invalid json
when debug is on.In my second request I need a csrf token that gets returned with the first request's (the bypass request) response. Essentially I am trying to create an account on this website by first retrieving the csrf after the initial bypass (which I can do successfully) and then sending a post request with the account information.
Like I said, I can do this successfully in Python which leads me to believe the issue is related to the module's way of handling post requests, but of course I'm probably wrong. This is my code when sending both requests.
I get a 200 on the first request, and a 403 on the 2nd. This is what the server returns on the 403:
{ Response: null, StatusCode: 500, Status: '' }
Hopefully I'm being really stupid and there is a super simple solution to this. And like I said, my Python version is 100% functional doing the exact same request with the same headers and everything.
Thanks and sorry for any confusions and the long code. I've been looking at this for way longer than I should have and haven't been able to find the solution. Any help is much appreciated.