Closed Revadike closed 5 years ago
@Revadike, we haven't forgot about this. My priorities are functional bugs before feature bugs but I might have a PR ready by Monday. Feel free to beat me to it.
@codemanki @Revadike I've thought about this and I'm not sure what is desired in terms of API.
Perhaps introduce cloudscraper.solveCaptcha
? How do you imagine that working? @codemanki is already working on separating the parsing from the main library. Would simply adding a method to extract the links to the captcha's images be enough? Something along the lines of cloudscraper.extractCaptcha
?
You'll have to wait for your service to solve the captcha anyway so there has to be some sort of callback. What I'm saying is, no matter what, it's going to look something similar to the following:
var cloudscraper = require('cloudscraper');
var CaptchaError = require('cloudscraper/errors').CaptchaError;
cloudscraper.get('http://example-site.com')
.then(doSomethingWithBody)
.catch(function handleCaptcha(error) {
if (error instanceof CaptchaError) {
var captcha = cloudscraper.extractCaptcha(error.response);
return getSolutionFromService(captcha).then(function (solution) {
return cloudscraper.post({
uri: captcha.submitURI || response.request.uri
formData: solution
}).then(doSomethingWithBody).catch(handleCaptcha);
}, handleError);
}
else {
handleError(error);
}
});
So I don't think there would be much sense in adding anything beyond extracting the captcha and submitting the solution. To be clear thats -1 for the cloudscraper.solveCaptcha
and +1 for a method to extract captcha information. I'm neutral on a method to submit the solution. What are your thoughts on this?
All we need is the url
of the site (which we should have already) and the site-key
value of the recaptcha. So you need to provide use with the site key. What I suggest, you also provide us with a function/callback, where we have to pass the recaptcha solution (g-recaptcha-response
in the form) in as parameter.
Or what about this: Add an option for captcha handler, like this
const url = 'https://domain.com';
CloudScraper({
url: url,
method: 'GET',
captchaHandler: (sitekey, callback) => {
solver(url, sitekey).then(solution => callback(solution))
}
});
I'd recommend hiding the logic of submitting the captcha form for cloudflare protection. I think this is something the library should provide for us.
@Revadike Another option is to emit a captcha event and if nobody is listening throw an error. This would work the same way as error events. If somebody is listening, expect them to handle the captcha otherwise throw a CaptchaError. If a handler is present(listening), the instance(cloudscraper) would remain idle until a method is used to continue processing the request.
var request = cloudscraper.get(uri);
request.on('captcha', function(response) {
// var captcha = { siteKey: '...' };
var captcha = response.captcha;
myService.solve(captcha).then(function (solution) {
captcha.submit(solution);
});
});
Unless there is new suggestions, this is what I think the API should be:
var request = cloudscraper.get(uri);
request.on('captcha', (response, callback) => {
// response.captcha = { siteKey: '...' };
myService.solve(response.captcha)
.then(solution => { callback(null, solution); })
.catch(error => { callback(error); });
});
request.then(doSomethingWithBody, handleError);
Or equivalently:
var request = cloudscraper.get(uri);
request.on('captcha', (response, callback) => {
// response.captcha = { siteKey: '...' };
myService.solve(response.captcha, callback);
});
request.then(doSomethingWithBody, handleError);
if nobody is listening for the captcha event, cloudscraper will throw a CaptchaError.
The captcha event listener, if it exists, will be called with response and callback as it's arguments.
The response.captcha
property will not be created unless somebody is listening for the captcha event.
If the callback is called with an error, cloudscraper will throw a CaptchaError with that error as the cause. This way the original request callback always gets called.
@pro-src so we won't support recaptcha in callbacks ( if cloudscraper is used with callbacks and not with promises) ?
@codemanki The request instance that we return from cloudscraper
is always an instanceof EventEmitter and we could emit that event if somebody is listening in either of those cases. I don't like the way that looks at all though. :disappointed:
var request = cloudscraper.get(uri, (error, response, body) => {
if (error) return handleError(error);
doSomethingWithBody(body);
});
request.on('captcha', (response, callback) => {
// response.captcha = { siteKey: '...' };
myService.solve(response.captcha, callback);
});
@pro-src then it is fine. After all this is a very great feature to have, and even if using it would require a developer to have callbacks and an event listener, it is still awesome feature :D
BTW, found this in anti-captcha api docs: https://anticaptcha.atlassian.net/wiki/spaces/API/pages/6029327/Forms+with+Recaptcha.+Submit+automation+scheme. https://anticaptcha.atlassian.net/wiki/spaces/API/pages/9666575/Reproducing+Recaptcha+validation+without+digging+the+HTML+source
Something to consider
Hmm, but this is per request. Wouldn't it be nice to have 1 listener for all captcha requests and handle them? I think that would be a bit better than adding 1 line to every request you want to solve captchas for.
@Revadike does have a point and I'm leaning towards his suggestion of passing a captcha handler to cloudscraper as an option. This way the defaults
function may be used to avoid having to create wrappers. This also would support usage of cloudscraper in frameworks.
var cloudscraper = require('cloudscraper').defaults({ onCaptcha });
Does it still make sense if we're emitting other events?
The response and stream related events are already being emitted from initial requests and I've had it in mind to make those work over all requests made in a singe cloudscraper call to support streams. Simply adding captcha / cloudflare-captcha to the list seems to make the most sense.
Maybe we could implement a very basic plugin system.
var cloudscraper = require('cloudscraper');
cloudscraper.use(myPlugin);
function myPlugin(cloudscraper) {
cloudscraper.on('captcha', captchaHandler);
}
Edit: Nah, I think it's better to follow request/request
's design. If #139 is resolved in the way I suggested, you could extend cloudscraper.
var cloudscraper = require('cloudscraper');
class CaptchaSupport extends cloudscraper.Cloudscraper {
constructor(options) { super(options); }
}
cloudscraper.Cloudscraper = CaptchaSupport;
Cloudflare is doing something extra?
This is the parsed querystring that is sent to /cdn-cgi/l/chk_captcha
[
{
"name": "s",
"value": "0ac7c0ae138677a8c0806c3119efb6f56108a100-1552751638-1800-AXVMM9TstwfvK3fuagOdfKoaTn88I31GX%2Bd%2F6zgEtaL3TdWP2EYAuIrrJlFZF5L5MqwuWQFFPmTMdn93KPAQbhfbqUqz%2BFLePKZSHNMwLFdxT216EFLUO6ztDVL9r3VU2Q%3D%3D"
},
{
"name": "id",
"value": "4b87e72f9afac1e0"
},
{
"name": "g-recaptcha-response",
"value": "03AOLTBLR2PLOHTsduP5OpQE2rhS-5i_ThyR5nRtSBwtl959TPJV65Fh2nQqIxRMhOtKHhKil_vHVf8iBeLO5bbAg_MwpO7SEkuCgmEV42oIc4sndLIeAhZk2hJvnC6eJ94P7T4ZUQ_jwii8QPm1re1MFUNdZNsNh93RUJloRggH8B3nzSnk74NxAfnM4wxwlZNapedOe24ngYQ8_rebOPX8YdwzkVI-p6NIjkSKzHgQFHJndW4zyN6kQt6orjehGYocLnKqjjLlHYNFYkl3oQjdGt8yrXe0C6x55zm4RCPO8MkGoGxQl2m4zMnsHQzYhNd1Uh7NktH0Gw"
},
{
"name": "bf_challenge_id",
"value": "10808"
},
{
"name": "bf_execution_time",
"value": "34"
},
{
"name": "bf_result_hash",
"value": "375207141"
}
]
I don't think services like anti-captcha will provide the bf_
values. AFAIK, all of that information can be easily extracted or calculated except for the bf_result_hash
. Does anybody know anything about those extra fields?
Edit: I'm comparing the request/response to those from https://recaptcha-demo.appspot.com
Yikes, Cloudflare has a bot-filter script that runs in an iframe and only if the runtime passes inspection is the results
event emitted on the window. The results
event contains the required bf_
values. The only way to bypass these recaptchas(v2) is if you first bypass the bot-filter...
I'm guessing that you can't bypass this even if you use something like selenium but maybe you could get by with a headless version of chromium? I don't think this is a Cloudflare only solution either because google may have something similar according to uncaptcha2's project markdown: https://github.com/ecthros/uncaptcha2 Or maybe this bot-filter is from google... I haven't checked that*
@codemanki While we could bypass this bot filter, I don't like the idea of maintaining it. I think this issue can be closed.
Here's a gist of the bot-filter: https://gist.github.com/pro-src/8e060254d6281a7d93a5d9d02e369574
Maybe, but a bot-filter sounds an awfully lot like anti-scraper, hence its something you should handle.
@Revadike It could be something that we'd like to handle but that doesn't make it our responsibility to do such. Personally, If I was to provide a solution for this version of the bot-filter, it'd be without any promise of maintenance. I think the discussion needs to be about whether or not we're willing to maintain this feature before we introduce it.
@Revadike I need to research this a bit more but it might be possible for the user to provide the bf_result_hash
. This is assuming that there isn't multiple versions of the bot-filter challenge active at the same time, meaning you always get the same bot-filter unless it's been updated. If thats the case, you could simply copy the value from your browser, provide that value to cloudscraper along with your browser's UA. This would put the burden on the user to keep up with the bot-filter.
IMHO, this is not a maintainable feature outside of a browser-like runtime environment.
Edit: Debundled and deobfuscated. See the latest revision of the bot-filter gist.
@Revadike I'm performing a more thorough investigation of the bot-filter. The security through obscurity does have me deterred from proposing/maintaining a solution as a feature of cloudscraper. This is mainly due to the fact that if they update it, I'll have to go through this entire process again. As you can see, it takes time. Are you interested in a solution to the bot-filter anyway?
I've been busy but I'll get back to this eventually.
How would one go about getting the HTML string when encountering a captcha error? I'm gonna look into this myself.
@Revadike
const cloudscraper = require('cloudscraper');
const { CaptchaError, ParserError } = require('cloudscraper/errors');
cloudcraper.get(uri).catch(error => {
if (error instanceof CaptchaError) {
// Feel free to open an issue about response.body always being a buffer here
console.log(error.response.body.toString('utf8'));
}
else if (error.response.challenge) {
// Also note that in the latest version that if challenge evaluation failed
// error instanceof ParserError === true
console.log(error.response.challenge);
}
});
This is back on my TODO list: https://github.com/VeNoMouS/cloudflare-scrape-js2py/issues/20#issuecomment-481199292
Cloudflare simply ignores the BF values so we don't have to bypass the BF afterall. Edit: Beware that they could start checking them at anytime... I don't have any idea why they're currently not already doing so.
I don't know what issue you're having, but this works just fine...
const cloudscraper = require('cloudscraper');
const { CaptchaError, ParserError } = require('cloudscraper/errors');
const cheerio = require('cheerio');
const SomeCaptchaService = require(`some-captcha-service`);
const url = "https://revadike.ga";
cloudscraper.get({ uri: url, headers: { cookie: "captcha=1" } }).catch(error => {
if (error instanceof CaptchaError) {
// Feel free to open an issue about response.body always being a buffer here
const body = error.response.body.toString('utf8');
//console.log(body);
const $ = cheerio.load(body);
const sitekey = $("[data-sitekey]").data("sitekey");
const form = {};
$("#challenge-form").serializeArray().forEach(input => form[input.name] = input.value);
console.log(sitekey, form);
solveReCAPTCHA(url, sitekey, (error, captcharesponse) => {
if (error) return console.log(error);
form["g-recaptcha-response"] = captcharesponse;
console.log("Submitting captcha response...");
cloudscraper.post({ uri: url, form: form, headers: { cookie: "captcha=1" } }).then(console.log).catch(console.log);
})
}
else if (error.response.challenge) {
// Also note that in the latest version that if challenge evaluation failed
// error instanceof ParserError === true
console.log(error.response.challenge);
}
});
function solveReCAPTCHA(url, key, callback) {
// ...
}
Also, cheerio is overkill for this purpose...
For anybody following this issue, if you attempt to solve these CAPTCHA, please be aware that you need to actually execute the bot-filter to get all of the form values.
addInput(f, 'bf_challenge_id', '2335');
addInput(f, 'bf_execution_time', event.data.executionTimeMs);
addInput(f, 'bf_result_hash', event.data.resultHash);
As of right now, Cloudflare ignores the fact that you're not providing these values for some reason. I wouldn't expect this to always be the case. To be clear, you don't actually have to add the bf_*
values when submitting the form.
Some domains which have configured custom challenge pages don't have the bot-filter present in their responses at all. If you're targeting such a domain, feel free to disregard this comment.
For OGUsers I actually had to use custom URL:
$("#challenge-form").serializeArray().forEach(input => query[input.name] = input.value);
solveReCAPTCHA(options.uri, sitekey, (error, captcharesponse) => {
if (error) {
gotError(error);
return;
}
query["g-recaptcha-response"] = captcharesponse;
cf({
uri: "https://ogusers.com/cdn-cgi/l/chk_captcha",
qs: query,
followAllRedirects: true,
method: "GET"
}).then(() => cf(options).then(options.cb).catch(gotError)).catch(gotError);
})
Where is the custom URL? Oh, you thought that URL was custom? No, it's the default for CF CAPTCHA submission but I'm pretty sure that shows that your other code never worked. :rofl:
Where is the custom URL? Oh, you thought that URL was custom? No, it's the default for CF CAPTCHA submission but I'm pretty sure that shows that your other code never worked. 🤣
The other code worked too. You can try yourself.
No thanks, make your code work for other sites as well.
var url = require('url');
var form = $('#challenge-form');
// It's always /cdn-cgi/l/chk_captcha but to make the code more dynamic...
var actionURI = form.attr('action') || '/cdn-cgi/l/chk_captcha';
// This should always be get but for the same reason as above...
var method = form.attr('method') || 'GET'
actionURI = url.resolve(error.response.request.uri.href, actionURI);
@Revadike I've added support for solving CAPTCHA and I need a tester.
cd your-project
npm i --save "git://github.com/pro-src/cloudscraper.git#v4.0.0"
# The following command will remove node_modules and do a fresh install
npm ci
function onCaptcha (options, response, body) {
const captcha = response.captcha;
// We'll probably need to change the uri here (I see a potential bug)
// Maybe we'll provide captcha.uri instead
solveReCAPTCHA(response.request.uri, captcha.siteKey, (error, gRes) => {
if (error) return void captcha.submit(error);
captcha.form['g-captcha-response'] = gRes;
captcha.submit();
});
}
const cloudscraper = require('cloudscraper').defaults({ onCaptcha });
If your reCAPTCHA solving API returns a promise:
function onCaptcha(options, response, body) {
const { captcha } = response;
return solveReCAPTCHA(response.request.uri, captcha.siteKey).then(gRes => {
captcha.form['g-recaptcha-response'] = gRes;
});
}
The sooner this gets tested and fine tuned, the sooner we can convince @codemanki to release v4 :D
Also the API is completely open to discussion. It's always easier to add API than to remove it later on. Once you release the API, you're obligated to support it... So lets definitely make sure that this is the desired API.
I'm getting this:
{ ParserError:
### Cloudflare may have changed their technique, or there may be a bug.
### Bug Reports: https://github.com/codemanki/cloudscraper/issues
### Check the detailed exception message that follows for the cause.
Challenge form is missing inputs
at onCaptcha (cloudscraper\index.js:380:21)
at onCloudflareResponse (cloudscraper\index.js:196:14)
at onRequestResponse (cloudscraper\index.js:171:5)
at Request.<anonymous> (cloudscraper\index.js:132:7)
at Object.onceWrapper (events.js:285:13)
at Request.emit (events.js:197:13)
at Request.<anonymous> (request\request.js:1161:10)
at Request.emit (events.js:197:13)
at IncomingMessage.<anonymous> (request\request.js:1083:12)
at Object.onceWrapper (events.js:285:13)
at IncomingMessage.emit (events.js:202:15)
at endReadableNT (_stream_readable.js:1129:12)
at processTicksAndRejections (internal/process/next_tick.js:76:17)
name: 'ParserError',
message:
'\r\n### Cloudflare may have changed their technique, or there may be a bug.\r\n### Bug Reports: https://github.com/codemanki/cloudscraper/issues\r\n### Check the detailed exception message that follows for the cause.\r\n\r\nChallenge form is missing inputs',
cause: 'Challenge form is missing inputs',
error: 'Challenge form is missing inputs',
options:
{ requester:
{ [Function: request]
get: [Function],
head: [Function],
options: [Function],
post: [Function],
put: [Function],
patch: [Function],
del: [Function],
delete: [Function],
jar: [Function],
cookie: [Function],
defaults: [Function],
forever: [Function],
Request: [Function],
initParams: [Function: initParams],
debug: [Getter/Setter],
bindCLS: [Function: RP$bindCLS] },
jar: RequestJar { _jar: [CookieJar] },
headers:
{ Host: Symbol(host),
Connection: 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'User-Agent':
'Mozilla/5.0 (Linux; Android 8.0.0; SM-G960F Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.85 Mobile Safari/537.36',
Accept:
'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
cookie: 'captcha=1' },
cloudflareMaxTimeout: 30000,
followAllRedirects: true,
challengesToSolve: 3,
decodeEmails: false,
gzip: true,
onCaptcha: [Function: onCaptcha],
uri: 'https://revadike.ga',
pool: undefined,
method: 'GET',
realEncoding: 'utf8',
encoding: null,
callback: [Function] },
response:
IncomingMessage {
_readableState:
ReadableState {
objectMode: false,
highWaterMark: 16384,
buffer: BufferList { head: null, tail: null, length: 0 },
length: 0,
pipes: null,
pipesCount: 0,
flowing: true,
ended: true,
endEmitted: true,
reading: false,
sync: false,
needReadable: false,
emittedReadable: false,
readableListening: false,
resumeScheduled: false,
paused: false,
emitClose: true,
autoDestroy: false,
destroyed: false,
defaultEncoding: 'utf8',
awaitDrain: 0,
readingMore: false,
decoder: null,
encoding: null },
readable: false,
_events:
[Object: null prototype] {
end: [Array],
close: [Array],
data: [Function],
error: [Function] },
_eventsCount: 4,
_maxListeners: undefined,
socket:
TLSSocket {
_tlsOptions: [Object],
_secureEstablished: true,
_securePending: false,
_newSessionPending: false,
_controlReleased: true,
_SNICallback: null,
servername: 'revadike.ga',
alpnProtocol: false,
authorized: true,
authorizationError: null,
encrypted: true,
_events: [Object],
_eventsCount: 8,
connecting: false,
_hadError: false,
_handle: [TLSWrap],
_parent: null,
_host: 'revadike.ga',
_readableState: [ReadableState],
readable: true,
_maxListeners: undefined,
_writableState: [WritableState],
writable: false,
allowHalfOpen: false,
_sockname: null,
_pendingData: null,
_pendingEncoding: '',
server: undefined,
_server: null,
ssl: [TLSWrap],
_requestCert: true,
_rejectUnauthorized: true,
parser: null,
_httpMessage: [ClientRequest],
[Symbol(res)]: [TLSWrap],
[Symbol(asyncId)]: 6,
[Symbol(lastWriteQueueSize)]: 0,
[Symbol(timeout)]: null,
[Symbol(kBytesRead)]: 0,
[Symbol(kBytesWritten)]: 0,
[Symbol(connect-options)]: [Object] },
connection:
TLSSocket {
_tlsOptions: [Object],
_secureEstablished: true,
_securePending: false,
_newSessionPending: false,
_controlReleased: true,
_SNICallback: null,
servername: 'revadike.ga',
alpnProtocol: false,
authorized: true,
authorizationError: null,
encrypted: true,
_events: [Object],
_eventsCount: 8,
connecting: false,
_hadError: false,
_handle: [TLSWrap],
_parent: null,
_host: 'revadike.ga',
_readableState: [ReadableState],
readable: true,
_maxListeners: undefined,
_writableState: [WritableState],
writable: false,
allowHalfOpen: false,
_sockname: null,
_pendingData: null,
_pendingEncoding: '',
server: undefined,
_server: null,
ssl: [TLSWrap],
_requestCert: true,
_rejectUnauthorized: true,
parser: null,
_httpMessage: [ClientRequest],
[Symbol(res)]: [TLSWrap],
[Symbol(asyncId)]: 6,
[Symbol(lastWriteQueueSize)]: 0,
[Symbol(timeout)]: null,
[Symbol(kBytesRead)]: 0,
[Symbol(kBytesWritten)]: 0,
[Symbol(connect-options)]: [Object] },
httpVersionMajor: 1,
httpVersionMinor: 1,
httpVersion: '1.1',
complete: true,
headers:
{ date: 'Mon, 15 Apr 2019 20:19:24 GMT',
'content-type': 'text/html; charset=UTF-8',
'transfer-encoding': 'chunked',
connection: 'close',
'set-cookie': [Array],
'cf-chl-bypass': '1',
'cache-control': 'max-age=2',
expires: 'Mon, 15 Apr 2019 20:19:26 GMT',
'x-frame-options': 'SAMEORIGIN',
'expect-ct':
'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"',
vary: 'Accept-Encoding',
server: 'cloudflare',
'cf-ray': '4c809d3e6f5b2bdc-AMS',
'content-encoding': 'br' },
rawHeaders:
[ 'Date',
'Mon, 15 Apr 2019 20:19:24 GMT',
'Content-Type',
'text/html; charset=UTF-8',
'Transfer-Encoding',
'chunked',
'Connection',
'close',
'Set-Cookie',
'__cfduid=dbb615291d299aa1e3639ab57d3fdd2a81555359564; expires=Tue, 14-Apr-20 20:19:24 GMT; path=/; domain=.revadike.ga; HttpOnly',
'CF-Chl-Bypass',
'1',
'Cache-Control',
'max-age=2',
'Expires',
'Mon, 15 Apr 2019 20:19:26 GMT',
'X-Frame-Options',
'SAMEORIGIN',
'Expect-CT',
'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"',
'Vary',
'Accept-Encoding',
'Server',
'cloudflare',
'CF-RAY',
'4c809d3e6f5b2bdc-AMS',
'Content-Encoding',
'br' ],
trailers: {},
rawTrailers: [],
aborted: false,
upgrade: false,
url: '',
method: null,
statusCode: 403,
statusMessage: 'Forbidden',
client:
TLSSocket {
_tlsOptions: [Object],
_secureEstablished: true,
_securePending: false,
_newSessionPending: false,
_controlReleased: true,
_SNICallback: null,
servername: 'revadike.ga',
alpnProtocol: false,
authorized: true,
authorizationError: null,
encrypted: true,
_events: [Object],
_eventsCount: 8,
connecting: false,
_hadError: false,
_handle: [TLSWrap],
_parent: null,
_host: 'revadike.ga',
_readableState: [ReadableState],
readable: true,
_maxListeners: undefined,
_writableState: [WritableState],
writable: false,
allowHalfOpen: false,
_sockname: null,
_pendingData: null,
_pendingEncoding: '',
server: undefined,
_server: null,
ssl: [TLSWrap],
_requestCert: true,
_rejectUnauthorized: true,
parser: null,
_httpMessage: [ClientRequest],
[Symbol(res)]: [TLSWrap],
[Symbol(asyncId)]: 6,
[Symbol(lastWriteQueueSize)]: 0,
[Symbol(timeout)]: null,
[Symbol(kBytesRead)]: 0,
[Symbol(kBytesWritten)]: 0,
[Symbol(connect-options)]: [Object] },
_consuming: true,
_dumped: false,
req:
ClientRequest {
_events: [Object],
_eventsCount: 5,
_maxListeners: undefined,
output: [],
outputEncodings: [],
outputCallbacks: [],
outputSize: 0,
writable: true,
_last: true,
chunkedEncoding: false,
shouldKeepAlive: false,
useChunkedEncodingByDefault: false,
sendDate: false,
_removedConnection: false,
_removedContLen: false,
_removedTE: false,
_contentLength: 0,
_hasBody: true,
_trailer: '',
finished: true,
_headerSent: true,
socket: [TLSSocket],
connection: [TLSSocket],
_header:
'GET / HTTP/1.1\r\nHost: revadike.ga\r\nConnection: keep-alive\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla/5.0 (Linux; Android 8.0.0; SM-G960F Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.85 Mobile Safari/537.36\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate, br\r\nAccept-Language: en-GB,en-US;q=0.9,en;q=0.8\r\ncookie: captcha=1\r\n\r\n',
_onPendingData: [Function: noopPendingOutput],
agent: [Agent],
socketPath: undefined,
timeout: undefined,
method: 'GET',
path: '/',
_ended: true,
res: [Circular],
aborted: false,
timeoutCb: null,
upgradeOrConnect: false,
parser: null,
maxHeadersCount: null,
[Symbol(isCorked)]: false,
[Symbol(outHeadersKey)]: [Object] },
request:
Request {
_events: [Object],
_eventsCount: 4,
_maxListeners: undefined,
requester: [Function],
headers: [Object],
cloudflareMaxTimeout: 30000,
followAllRedirects: true,
challengesToSolve: 3,
decodeEmails: false,
gzip: true,
onCaptcha: [Function: onCaptcha],
uri: [Url],
method: 'GET',
realEncoding: 'utf8',
encoding: null,
readable: true,
writable: true,
explicitMethod: true,
_qs: [Querystring],
_auth: [Auth],
_oauth: [OAuth],
_multipart: [Multipart],
_redirect: [Redirect],
_tunnel: [Tunnel],
_rp_resolve: [Function],
_rp_reject: [Function],
_rp_promise: [Promise],
_rp_callbackOrig: undefined,
callback: [Function],
_rp_options: [Object],
setHeader: [Function],
hasHeader: [Function],
getHeader: [Function],
removeHeader: [Function],
localAddress: undefined,
pool: {},
dests: [],
__isRequestRequest: true,
_callback: [Function: RP$callback],
proxy: null,
tunnel: true,
setHost: false,
originalCookieHeader: 'captcha=1',
_jar: [RequestJar],
port: 443,
host: 'revadike.ga',
path: '/',
httpModule: [Object],
agentClass: [Function: Agent],
agent: [Agent],
cloudscraper: true,
_started: true,
href: 'https://revadike.ga/',
req: [ClientRequest],
ntick: true,
response: [Circular],
originalHost: 'revadike.ga',
originalHostHeaderName: 'Host',
responseContent: [Circular],
_destdata: true,
_ended: true,
_callbackCalled: true },
toJSON: [Function: responseToJSON],
caseless: Caseless { dict: [Object] },
body:
<Buffer 3c 21 44 4f 43 54 59 50 45 20 68 74 6d 6c 3e 0a 3c 21 2d 2d 5b 69 66 20 6c 74 20 49 45 20 37 5d 3e 20 3c 68 74 6d 6c 20 63 6c 61 73 73 3d 22 6e 6f 2d ... 10951 more bytes>,
responseStartTime: 1555359571424,
isCloudflare: true,
isHTML: true,
isCaptcha: true,
challengeForm:
'\n <input type="hidden" name="s" value="d4476e9db638e3ac6423c3b230f5012f6b0f85a1-1555359564-1800-AQilGMdSzDeRH4slsuz5SY9TCvBkuEs7vHGbDdfAM+1AE7RaQ9DjbHMXFgAXbmLC1JPpqKDQnakhLNT+xta5W4e4hz33CDLCuld4t842s7nFyhRb+GCIBNpwk5hJ3CHMWgx8+y5x2ESElJOLmqHTrcI="></input>\n <script type="text/javascript" src="/cdn-cgi/scripts/cf.challenge.js" data-type="normal" data-ray="4c809d3e6f5b2bdc" async data-sitekey="6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0"></script>\n <div class="g-recaptcha"></div>\n <noscript id="cf-captcha-bookmark" class="cf-captcha-info">\n <div><div style="width: 302px">\n <div>\n <iframe src="https://www.google.com/recaptcha/api/fallback?k=6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0" frameborder="0" scrolling="no" style="width: 302px; height:422px; border-style: none;"></iframe>\n </div>\n <div style="width: 300px; border-style: none; bottom: 12px; left: 25px; margin: 0px; padding: 0px; right: 25px; background: #f9f9f9; border: 1px solid #c1c1c1; border-radius: 3px;">\n <textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none;"></textarea>\n <input type="submit" value="Submit"></input>\n </div>\n </div></div>\n </noscript>\n',
captcha:
{ siteKey: '6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0',
form: {} } } }
undefined
using this code:
const cloudscraper = require('cloudscraper').defaults({ onCaptcha });
const url = "https://revadike.ga";
cloudscraper.get({ uri: url, headers: { cookie: "captcha=1" } }).catch(console.warn).then(console.log)
function onCaptcha(options, response, body) {
const captcha = response.captcha;
// We'll probably need to change the uri here (I see a potential bug)
// Maybe we'll provide captcha.uri instead
solveReCAPTCHA(response.request.uri, captcha.siteKey, (error, gRes) => {
if (error) return void captcha.submit(error);
captcha.form['g-captcha-response'] = gRes;
captcha.submit();
});
}
And yes, I am using your branch. I double checked.
@Revadike It' the regex. Thanks for testing it. I'm going to write tests for it. The code coverage is currently 75% on that branch. Once it's 100% I'll update.
@Revadike I have tests in place now. Everything LGTM. Care to try again?
The tests are solid. The code is good. API... umm... I'm okay with it. If you've been waiting on this, get it while it's hot: #198 (v4.0.0 still being reviewed but gtg) :rocket:
Your example code had some mistakes:
response.request.uri
should be response.request.uri.href
captcha.form['g-captcha-response']
should be captcha.form['g-recaptcha-response']
After fixing that, your latest branch worked!
How does if (error) return void captcha.submit(error);
work? Why submit an error and why use void?
Awesome, thanks for letting us know.
How does if (error) return void captcha.submit(error); work? Why submit an error and why use void?
So if you have a default handler, the error propagates back to the caller. void
has been in JS since the first version of the language. I'm using it there for readability (It doesn't return a value) and it can be used as a safeguard. It ensures that the return value is undefined.
An example:
const cloudscraper = require('cloudscraper').defaults({ onCaptcha });
cloudscraper.get(uri).catch(error => {
// If the anti-captcha service had an error, I can handle it here.
console.log(error):
});
@Revadike did that clear it up?
I don't remember if I mentioned it or not but you can use promises instead of calling captcha.submit
. You would just return a promise in onCaptcha
. If it resolves, it's the equivalent of calling captcha.submit
. If it rejects, it's the equivalent of calling captcha.submit(error)
.
Allow us to solve and submit captchas for pages that requires one. For example, you could configure https://anti-captcha.com to solve them.