Add support for recaptcha

Revadike commented 6 years ago

Allow us to solve and submit captchas for pages that requires one. For example, you could configure https://anti-captcha.com to solve them.

ghost commented 5 years ago

https://github.com/Anorov/cloudflare-scrape/issues/167

ghost commented 5 years ago

@Revadike, we haven't forgot about this. My priorities are functional bugs before feature bugs but I might have a PR ready by Monday. Feel free to beat me to it.

ghost commented 5 years ago

@codemanki @Revadike I've thought about this and I'm not sure what is desired in terms of API.

Perhaps introduce cloudscraper.solveCaptcha? How do you imagine that working? @codemanki is already working on separating the parsing from the main library. Would simply adding a method to extract the links to the captcha's images be enough? Something along the lines of cloudscraper.extractCaptcha?

You'll have to wait for your service to solve the captcha anyway so there has to be some sort of callback. What I'm saying is, no matter what, it's going to look something similar to the following:

var cloudscraper = require('cloudscraper');
var CaptchaError = require('cloudscraper/errors').CaptchaError;

cloudscraper.get('http://example-site.com')
  .then(doSomethingWithBody)
  .catch(function handleCaptcha(error) {
    if (error instanceof CaptchaError) {
      var captcha = cloudscraper.extractCaptcha(error.response);
      return getSolutionFromService(captcha).then(function (solution) {
        return cloudscraper.post({
          uri: captcha.submitURI || response.request.uri
          formData: solution
        }).then(doSomethingWithBody).catch(handleCaptcha);
      }, handleError);
    }
   else {
     handleError(error);
   }
  });

So I don't think there would be much sense in adding anything beyond extracting the captcha and submitting the solution. To be clear thats -1 for the cloudscraper.solveCaptcha and +1 for a method to extract captcha information. I'm neutral on a method to submit the solution. What are your thoughts on this?

Revadike commented 5 years ago

All we need is the url of the site (which we should have already) and the site-key value of the recaptcha. So you need to provide use with the site key. What I suggest, you also provide us with a function/callback, where we have to pass the recaptcha solution (g-recaptcha-response in the form) in as parameter.

Or what about this: Add an option for captcha handler, like this

const url = 'https://domain.com';
CloudScraper({
    url: url,
    method: 'GET',
    captchaHandler: (sitekey, callback) => {
        solver(url, sitekey).then(solution => callback(solution))
    }
});

I'd recommend hiding the logic of submitting the captcha form for cloudflare protection. I think this is something the library should provide for us.

ghost commented 5 years ago

@Revadike Another option is to emit a captcha event and if nobody is listening throw an error. This would work the same way as error events. If somebody is listening, expect them to handle the captcha otherwise throw a CaptchaError. If a handler is present(listening), the instance(cloudscraper) would remain idle until a method is used to continue processing the request.

var request = cloudscraper.get(uri);
request.on('captcha', function(response) {
  // var captcha = { siteKey: '...' };
  var captcha = response.captcha;
  myService.solve(captcha).then(function (solution) {
    captcha.submit(solution);
  });
});

ghost commented 5 years ago

Unless there is new suggestions, this is what I think the API should be:

var request = cloudscraper.get(uri);
request.on('captcha', (response, callback) => {
  // response.captcha = { siteKey: '...' };
  myService.solve(response.captcha)
    .then(solution => { callback(null, solution); })
    .catch(error => { callback(error); });
});
request.then(doSomethingWithBody, handleError);

Or equivalently:

var request = cloudscraper.get(uri);
request.on('captcha', (response, callback) => {
  // response.captcha = { siteKey: '...' };
  myService.solve(response.captcha, callback);
});
request.then(doSomethingWithBody, handleError);

if nobody is listening for the captcha event, cloudscraper will throw a CaptchaError. The captcha event listener, if it exists, will be called with response and callback as it's arguments. The response.captcha property will not be created unless somebody is listening for the captcha event. If the callback is called with an error, cloudscraper will throw a CaptchaError with that error as the cause. This way the original request callback always gets called.

codemanki commented 5 years ago

@pro-src so we won't support recaptcha in callbacks ( if cloudscraper is used with callbacks and not with promises) ?

ghost commented 5 years ago

@codemanki The request instance that we return from cloudscraper is always an instanceof EventEmitter and we could emit that event if somebody is listening in either of those cases. I don't like the way that looks at all though. :disappointed:

var request = cloudscraper.get(uri, (error, response, body) => {
  if (error) return handleError(error);
  doSomethingWithBody(body);
});

request.on('captcha', (response, callback) => {
  // response.captcha = { siteKey: '...' };
  myService.solve(response.captcha, callback);
});

codemanki commented 5 years ago

@pro-src then it is fine. After all this is a very great feature to have, and even if using it would require a developer to have callbacks and an event listener, it is still awesome feature :D

codemanki commented 5 years ago

BTW, found this in anti-captcha api docs: https://anticaptcha.atlassian.net/wiki/spaces/API/pages/6029327/Forms+with+Recaptcha.+Submit+automation+scheme. https://anticaptcha.atlassian.net/wiki/spaces/API/pages/9666575/Reproducing+Recaptcha+validation+without+digging+the+HTML+source

Something to consider

Revadike commented 5 years ago

Hmm, but this is per request. Wouldn't it be nice to have 1 listener for all captcha requests and handle them? I think that would be a bit better than adding 1 line to every request you want to solve captchas for.

ghost commented 5 years ago

@Revadike does have a point and I'm leaning towards his suggestion of passing a captcha handler to cloudscraper as an option. This way the defaults function may be used to avoid having to create wrappers. This also would support usage of cloudscraper in frameworks.

var cloudscraper = require('cloudscraper').defaults({ onCaptcha });

Does it still make sense if we're emitting other events?

error
response
redirect
cloudflare-response
cloudflare-challenge
cloudflare-redirect
cloudflare-whatever
data
end
pause
resume

The response and stream related events are already being emitted from initial requests and I've had it in mind to make those work over all requests made in a singe cloudscraper call to support streams. Simply adding captcha / cloudflare-captcha to the list seems to make the most sense.

Maybe we could implement a very basic plugin system.

var cloudscraper = require('cloudscraper');
cloudscraper.use(myPlugin);

function myPlugin(cloudscraper) {
  cloudscraper.on('captcha', captchaHandler);
}

Edit: Nah, I think it's better to follow request/request's design. If #139 is resolved in the way I suggested, you could extend cloudscraper.


var cloudscraper = require('cloudscraper');
class CaptchaSupport extends cloudscraper.Cloudscraper {
  constructor(options) { super(options); }
}
cloudscraper.Cloudscraper = CaptchaSupport;

ghost commented 5 years ago

Cloudflare is doing something extra? This is the parsed querystring that is sent to /cdn-cgi/l/chk_captcha

[
  {
    "name": "s",
    "value": "0ac7c0ae138677a8c0806c3119efb6f56108a100-1552751638-1800-AXVMM9TstwfvK3fuagOdfKoaTn88I31GX%2Bd%2F6zgEtaL3TdWP2EYAuIrrJlFZF5L5MqwuWQFFPmTMdn93KPAQbhfbqUqz%2BFLePKZSHNMwLFdxT216EFLUO6ztDVL9r3VU2Q%3D%3D"
  },
  {
    "name": "id",
    "value": "4b87e72f9afac1e0"
  },
  {
    "name": "g-recaptcha-response",
    "value": "03AOLTBLR2PLOHTsduP5OpQE2rhS-5i_ThyR5nRtSBwtl959TPJV65Fh2nQqIxRMhOtKHhKil_vHVf8iBeLO5bbAg_MwpO7SEkuCgmEV42oIc4sndLIeAhZk2hJvnC6eJ94P7T4ZUQ_jwii8QPm1re1MFUNdZNsNh93RUJloRggH8B3nzSnk74NxAfnM4wxwlZNapedOe24ngYQ8_rebOPX8YdwzkVI-p6NIjkSKzHgQFHJndW4zyN6kQt6orjehGYocLnKqjjLlHYNFYkl3oQjdGt8yrXe0C6x55zm4RCPO8MkGoGxQl2m4zMnsHQzYhNd1Uh7NktH0Gw"
  },
  {
    "name": "bf_challenge_id",
    "value": "10808"
  },
  {
    "name": "bf_execution_time",
    "value": "34"
  },
  {
    "name": "bf_result_hash",
    "value": "375207141"
  }
]

I don't think services like anti-captcha will provide the bf_ values. AFAIK, all of that information can be easily extracted or calculated except for the bf_result_hash. Does anybody know anything about those extra fields?

Edit: I'm comparing the request/response to those from https://recaptcha-demo.appspot.com

ghost commented 5 years ago

Yikes, Cloudflare has a bot-filter script that runs in an iframe and only if the runtime passes inspection is the results event emitted on the window. The results event contains the required bf_ values. The only way to bypass these recaptchas(v2) is if you first bypass the bot-filter...

I'm guessing that you can't bypass this even if you use something like selenium but maybe you could get by with a headless version of chromium? I don't think this is a Cloudflare only solution either because google may have something similar according to uncaptcha2's project markdown: https://github.com/ecthros/uncaptcha2 Or maybe this bot-filter is from google... I haven't checked that*

ghost commented 5 years ago

@codemanki While we could bypass this bot filter, I don't like the idea of maintaining it. I think this issue can be closed.

Here's a gist of the bot-filter: https://gist.github.com/pro-src/8e060254d6281a7d93a5d9d02e369574

Revadike commented 5 years ago

Maybe, but a bot-filter sounds an awfully lot like anti-scraper, hence its something you should handle.

ghost commented 5 years ago

@Revadike It could be something that we'd like to handle but that doesn't make it our responsibility to do such. Personally, If I was to provide a solution for this version of the bot-filter, it'd be without any promise of maintenance. I think the discussion needs to be about whether or not we're willing to maintain this feature before we introduce it.

ghost commented 5 years ago

@Revadike I need to research this a bit more but it might be possible for the user to provide the bf_result_hash. This is assuming that there isn't multiple versions of the bot-filter challenge active at the same time, meaning you always get the same bot-filter unless it's been updated. If thats the case, you could simply copy the value from your browser, provide that value to cloudscraper along with your browser's UA. This would put the burden on the user to keep up with the bot-filter.

IMHO, this is not a maintainable feature outside of a browser-like runtime environment.

Edit: Debundled and deobfuscated. See the latest revision of the bot-filter gist.

ghost commented 5 years ago

@Revadike I'm performing a more thorough investigation of the bot-filter. The security through obscurity does have me deterred from proposing/maintaining a solution as a feature of cloudscraper. This is mainly due to the fact that if they update it, I'll have to go through this entire process again. As you can see, it takes time. Are you interested in a solution to the bot-filter anyway?

ghost commented 5 years ago

I've been busy but I'll get back to this eventually.

Revadike commented 5 years ago

How would one go about getting the HTML string when encountering a captcha error? I'm gonna look into this myself.

ghost commented 5 years ago

@Revadike

const cloudscraper = require('cloudscraper');
const { CaptchaError, ParserError } = require('cloudscraper/errors');

cloudcraper.get(uri).catch(error => {
  if (error instanceof CaptchaError) {
    // Feel free to open an issue about response.body always being a buffer here
    console.log(error.response.body.toString('utf8'));
  }
  else if (error.response.challenge) {
    // Also note that in the latest version that if challenge evaluation failed
    // error instanceof ParserError === true
    console.log(error.response.challenge);
  }
});

ghost commented 5 years ago

Examples: https://github.com/codemanki/cloudscraper/pull/182/files

ghost commented 5 years ago

This is back on my TODO list: https://github.com/VeNoMouS/cloudflare-scrape-js2py/issues/20#issuecomment-481199292

Cloudflare simply ignores the BF values so we don't have to bypass the BF afterall. Edit: Beware that they could start checking them at anytime... I don't have any idea why they're currently not already doing so.

Revadike commented 5 years ago

I don't know what issue you're having, but this works just fine...


const cloudscraper = require('cloudscraper');
const { CaptchaError, ParserError } = require('cloudscraper/errors');
const cheerio = require('cheerio');
const SomeCaptchaService = require(`some-captcha-service`);
const url = "https://revadike.ga";

cloudscraper.get({ uri: url, headers: { cookie: "captcha=1" } }).catch(error => {
  if (error instanceof CaptchaError) {
    // Feel free to open an issue about response.body always being a buffer here
    const body = error.response.body.toString('utf8');
    //console.log(body);
    const $ = cheerio.load(body);
    const sitekey = $("[data-sitekey]").data("sitekey");
    const form = {};
    $("#challenge-form").serializeArray().forEach(input => form[input.name] = input.value);
    console.log(sitekey, form);
    solveReCAPTCHA(url, sitekey, (error, captcharesponse) => {
      if (error) return console.log(error);
      form["g-recaptcha-response"] = captcharesponse;
      console.log("Submitting captcha response...");
      cloudscraper.post({ uri: url, form: form, headers: { cookie: "captcha=1" } }).then(console.log).catch(console.log);
    })
  }
  else if (error.response.challenge) {
    // Also note that in the latest version that if challenge evaluation failed
    // error instanceof ParserError === true
    console.log(error.response.challenge);
  }
});

function solveReCAPTCHA(url, key, callback) {
  // ...
}

ghost commented 5 years ago

Also, cheerio is overkill for this purpose...

ghost commented 5 years ago

For anybody following this issue, if you attempt to solve these CAPTCHA, please be aware that you need to actually execute the bot-filter to get all of the form values.

            addInput(f, 'bf_challenge_id', '2335');
            addInput(f, 'bf_execution_time', event.data.executionTimeMs);
            addInput(f, 'bf_result_hash', event.data.resultHash);

As of right now, Cloudflare ignores the fact that you're not providing these values for some reason. I wouldn't expect this to always be the case. To be clear, you don't actually have to add the bf_* values when submitting the form.

Some domains which have configured custom challenge pages don't have the bot-filter present in their responses at all. If you're targeting such a domain, feel free to disregard this comment.

Revadike commented 5 years ago

For OGUsers I actually had to use custom URL:

$("#challenge-form").serializeArray().forEach(input => query[input.name] = input.value);
        solveReCAPTCHA(options.uri, sitekey, (error, captcharesponse) => {
            if (error) {
                gotError(error);
                return;
            }

            query["g-recaptcha-response"] = captcharesponse;
            cf({
                uri: "https://ogusers.com/cdn-cgi/l/chk_captcha",
                qs: query,
                followAllRedirects: true,
                method: "GET"
            }).then(() => cf(options).then(options.cb).catch(gotError)).catch(gotError);
        })

ghost commented 5 years ago

Where is the custom URL? Oh, you thought that URL was custom? No, it's the default for CF CAPTCHA submission but I'm pretty sure that shows that your other code never worked. :rofl:

Revadike commented 5 years ago

Where is the custom URL? Oh, you thought that URL was custom? No, it's the default for CF CAPTCHA submission but I'm pretty sure that shows that your other code never worked. 🤣

The other code worked too. You can try yourself.

ghost commented 5 years ago

No thanks, make your code work for other sites as well.

var url = require('url');
var form = $('#challenge-form');
// It's always /cdn-cgi/l/chk_captcha but to make the code more dynamic...
var actionURI = form.attr('action') || '/cdn-cgi/l/chk_captcha';
// This should always be get but for the same reason as above...
var method = form.attr('method') || 'GET'

actionURI = url.resolve(error.response.request.uri.href, actionURI);

ghost commented 5 years ago

@Revadike I've added support for solving CAPTCHA and I need a tester.

cd your-project
npm i --save "git://github.com/pro-src/cloudscraper.git#v4.0.0"
# The following command will remove node_modules and do a fresh install 
npm ci

function onCaptcha (options, response, body) {
  const captcha = response.captcha;
  // We'll probably need to change the uri here (I see a potential bug)
  // Maybe we'll provide captcha.uri instead
  solveReCAPTCHA(response.request.uri, captcha.siteKey, (error, gRes) => {
    if (error) return void captcha.submit(error);
    captcha.form['g-captcha-response'] = gRes;
    captcha.submit();
  });
}

const cloudscraper = require('cloudscraper').defaults({ onCaptcha });

If your reCAPTCHA solving API returns a promise:

function onCaptcha(options, response, body) {
  const { captcha } = response;
  return solveReCAPTCHA(response.request.uri, captcha.siteKey).then(gRes => {
    captcha.form['g-recaptcha-response'] = gRes;
  });
}

The sooner this gets tested and fine tuned, the sooner we can convince @codemanki to release v4 :D

ghost commented 5 years ago

Also the API is completely open to discussion. It's always easier to add API than to remove it later on. Once you release the API, you're obligated to support it... So lets definitely make sure that this is the desired API.

Revadike commented 5 years ago

I'm getting this:

{ ParserError:
### Cloudflare may have changed their technique, or there may be a bug.
### Bug Reports: https://github.com/codemanki/cloudscraper/issues
### Check the detailed exception message that follows for the cause.

Challenge form is missing inputs
    at onCaptcha (cloudscraper\index.js:380:21)
    at onCloudflareResponse (cloudscraper\index.js:196:14)
    at onRequestResponse (cloudscraper\index.js:171:5)
    at Request.<anonymous> (cloudscraper\index.js:132:7)
    at Object.onceWrapper (events.js:285:13)
    at Request.emit (events.js:197:13)
    at Request.<anonymous> (request\request.js:1161:10)
    at Request.emit (events.js:197:13)
    at IncomingMessage.<anonymous> (request\request.js:1083:12)
    at Object.onceWrapper (events.js:285:13)
    at IncomingMessage.emit (events.js:202:15)
    at endReadableNT (_stream_readable.js:1129:12)
    at processTicksAndRejections (internal/process/next_tick.js:76:17)
  name: 'ParserError',
  message:
   '\r\n### Cloudflare may have changed their technique, or there may be a bug.\r\n### Bug Reports: https://github.com/codemanki/cloudscraper/issues\r\n### Check the detailed exception message that follows for the cause.\r\n\r\nChallenge form is missing inputs',
  cause: 'Challenge form is missing inputs',
  error: 'Challenge form is missing inputs',
  options:
   { requester:
      { [Function: request]
        get: [Function],
        head: [Function],
        options: [Function],
        post: [Function],
        put: [Function],
        patch: [Function],
        del: [Function],
        delete: [Function],
        jar: [Function],
        cookie: [Function],
        defaults: [Function],
        forever: [Function],
        Request: [Function],
        initParams: [Function: initParams],
        debug: [Getter/Setter],
        bindCLS: [Function: RP$bindCLS] },
     jar: RequestJar { _jar: [CookieJar] },
     headers:
      { Host: Symbol(host),
        Connection: 'keep-alive',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent':
         'Mozilla/5.0 (Linux; Android 8.0.0; SM-G960F Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.85 Mobile Safari/537.36',
        Accept:
         'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
        cookie: 'captcha=1' },
     cloudflareMaxTimeout: 30000,
     followAllRedirects: true,
     challengesToSolve: 3,
     decodeEmails: false,
     gzip: true,
     onCaptcha: [Function: onCaptcha],
     uri: 'https://revadike.ga',
     pool: undefined,
     method: 'GET',
     realEncoding: 'utf8',
     encoding: null,
     callback: [Function] },
  response:
   IncomingMessage {
     _readableState:
      ReadableState {
        objectMode: false,
        highWaterMark: 16384,
        buffer: BufferList { head: null, tail: null, length: 0 },
        length: 0,
        pipes: null,
        pipesCount: 0,
        flowing: true,
        ended: true,
        endEmitted: true,
        reading: false,
        sync: false,
        needReadable: false,
        emittedReadable: false,
        readableListening: false,
        resumeScheduled: false,
        paused: false,
        emitClose: true,
        autoDestroy: false,
        destroyed: false,
        defaultEncoding: 'utf8',
        awaitDrain: 0,
        readingMore: false,
        decoder: null,
        encoding: null },
     readable: false,
     _events:
      [Object: null prototype] {
        end: [Array],
        close: [Array],
        data: [Function],
        error: [Function] },
     _eventsCount: 4,
     _maxListeners: undefined,
     socket:
      TLSSocket {
        _tlsOptions: [Object],
        _secureEstablished: true,
        _securePending: false,
        _newSessionPending: false,
        _controlReleased: true,
        _SNICallback: null,
        servername: 'revadike.ga',
        alpnProtocol: false,
        authorized: true,
        authorizationError: null,
        encrypted: true,
        _events: [Object],
        _eventsCount: 8,
        connecting: false,
        _hadError: false,
        _handle: [TLSWrap],
        _parent: null,
        _host: 'revadike.ga',
        _readableState: [ReadableState],
        readable: true,
        _maxListeners: undefined,
        _writableState: [WritableState],
        writable: false,
        allowHalfOpen: false,
        _sockname: null,
        _pendingData: null,
        _pendingEncoding: '',
        server: undefined,
        _server: null,
        ssl: [TLSWrap],
        _requestCert: true,
        _rejectUnauthorized: true,
        parser: null,
        _httpMessage: [ClientRequest],
        [Symbol(res)]: [TLSWrap],
        [Symbol(asyncId)]: 6,
        [Symbol(lastWriteQueueSize)]: 0,
        [Symbol(timeout)]: null,
        [Symbol(kBytesRead)]: 0,
        [Symbol(kBytesWritten)]: 0,
        [Symbol(connect-options)]: [Object] },
     connection:
      TLSSocket {
        _tlsOptions: [Object],
        _secureEstablished: true,
        _securePending: false,
        _newSessionPending: false,
        _controlReleased: true,
        _SNICallback: null,
        servername: 'revadike.ga',
        alpnProtocol: false,
        authorized: true,
        authorizationError: null,
        encrypted: true,
        _events: [Object],
        _eventsCount: 8,
        connecting: false,
        _hadError: false,
        _handle: [TLSWrap],
        _parent: null,
        _host: 'revadike.ga',
        _readableState: [ReadableState],
        readable: true,
        _maxListeners: undefined,
        _writableState: [WritableState],
        writable: false,
        allowHalfOpen: false,
        _sockname: null,
        _pendingData: null,
        _pendingEncoding: '',
        server: undefined,
        _server: null,
        ssl: [TLSWrap],
        _requestCert: true,
        _rejectUnauthorized: true,
        parser: null,
        _httpMessage: [ClientRequest],
        [Symbol(res)]: [TLSWrap],
        [Symbol(asyncId)]: 6,
        [Symbol(lastWriteQueueSize)]: 0,
        [Symbol(timeout)]: null,
        [Symbol(kBytesRead)]: 0,
        [Symbol(kBytesWritten)]: 0,
        [Symbol(connect-options)]: [Object] },
     httpVersionMajor: 1,
     httpVersionMinor: 1,
     httpVersion: '1.1',
     complete: true,
     headers:
      { date: 'Mon, 15 Apr 2019 20:19:24 GMT',
        'content-type': 'text/html; charset=UTF-8',
        'transfer-encoding': 'chunked',
        connection: 'close',
        'set-cookie': [Array],
        'cf-chl-bypass': '1',
        'cache-control': 'max-age=2',
        expires: 'Mon, 15 Apr 2019 20:19:26 GMT',
        'x-frame-options': 'SAMEORIGIN',
        'expect-ct':
         'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"',
        vary: 'Accept-Encoding',
        server: 'cloudflare',
        'cf-ray': '4c809d3e6f5b2bdc-AMS',
        'content-encoding': 'br' },
     rawHeaders:
      [ 'Date',
        'Mon, 15 Apr 2019 20:19:24 GMT',
        'Content-Type',
        'text/html; charset=UTF-8',
        'Transfer-Encoding',
        'chunked',
        'Connection',
        'close',
        'Set-Cookie',
        '__cfduid=dbb615291d299aa1e3639ab57d3fdd2a81555359564; expires=Tue, 14-Apr-20 20:19:24 GMT; path=/; domain=.revadike.ga; HttpOnly',
        'CF-Chl-Bypass',
        '1',
        'Cache-Control',
        'max-age=2',
        'Expires',
        'Mon, 15 Apr 2019 20:19:26 GMT',
        'X-Frame-Options',
        'SAMEORIGIN',
        'Expect-CT',
        'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"',
        'Vary',
        'Accept-Encoding',
        'Server',
        'cloudflare',
        'CF-RAY',
        '4c809d3e6f5b2bdc-AMS',
        'Content-Encoding',
        'br' ],
     trailers: {},
     rawTrailers: [],
     aborted: false,
     upgrade: false,
     url: '',
     method: null,
     statusCode: 403,
     statusMessage: 'Forbidden',
     client:
      TLSSocket {
        _tlsOptions: [Object],
        _secureEstablished: true,
        _securePending: false,
        _newSessionPending: false,
        _controlReleased: true,
        _SNICallback: null,
        servername: 'revadike.ga',
        alpnProtocol: false,
        authorized: true,
        authorizationError: null,
        encrypted: true,
        _events: [Object],
        _eventsCount: 8,
        connecting: false,
        _hadError: false,
        _handle: [TLSWrap],
        _parent: null,
        _host: 'revadike.ga',
        _readableState: [ReadableState],
        readable: true,
        _maxListeners: undefined,
        _writableState: [WritableState],
        writable: false,
        allowHalfOpen: false,
        _sockname: null,
        _pendingData: null,
        _pendingEncoding: '',
        server: undefined,
        _server: null,
        ssl: [TLSWrap],
        _requestCert: true,
        _rejectUnauthorized: true,
        parser: null,
        _httpMessage: [ClientRequest],
        [Symbol(res)]: [TLSWrap],
        [Symbol(asyncId)]: 6,
        [Symbol(lastWriteQueueSize)]: 0,
        [Symbol(timeout)]: null,
        [Symbol(kBytesRead)]: 0,
        [Symbol(kBytesWritten)]: 0,
        [Symbol(connect-options)]: [Object] },
     _consuming: true,
     _dumped: false,
     req:
      ClientRequest {
        _events: [Object],
        _eventsCount: 5,
        _maxListeners: undefined,
        output: [],
        outputEncodings: [],
        outputCallbacks: [],
        outputSize: 0,
        writable: true,
        _last: true,
        chunkedEncoding: false,
        shouldKeepAlive: false,
        useChunkedEncodingByDefault: false,
        sendDate: false,
        _removedConnection: false,
        _removedContLen: false,
        _removedTE: false,
        _contentLength: 0,
        _hasBody: true,
        _trailer: '',
        finished: true,
        _headerSent: true,
        socket: [TLSSocket],
        connection: [TLSSocket],
        _header:
         'GET / HTTP/1.1\r\nHost: revadike.ga\r\nConnection: keep-alive\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla/5.0 (Linux; Android 8.0.0; SM-G960F Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.85 Mobile Safari/537.36\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate, br\r\nAccept-Language: en-GB,en-US;q=0.9,en;q=0.8\r\ncookie: captcha=1\r\n\r\n',
        _onPendingData: [Function: noopPendingOutput],
        agent: [Agent],
        socketPath: undefined,
        timeout: undefined,
        method: 'GET',
        path: '/',
        _ended: true,
        res: [Circular],
        aborted: false,
        timeoutCb: null,
        upgradeOrConnect: false,
        parser: null,
        maxHeadersCount: null,
        [Symbol(isCorked)]: false,
        [Symbol(outHeadersKey)]: [Object] },
     request:
      Request {
        _events: [Object],
        _eventsCount: 4,
        _maxListeners: undefined,
        requester: [Function],
        headers: [Object],
        cloudflareMaxTimeout: 30000,
        followAllRedirects: true,
        challengesToSolve: 3,
        decodeEmails: false,
        gzip: true,
        onCaptcha: [Function: onCaptcha],
        uri: [Url],
        method: 'GET',
        realEncoding: 'utf8',
        encoding: null,
        readable: true,
        writable: true,
        explicitMethod: true,
        _qs: [Querystring],
        _auth: [Auth],
        _oauth: [OAuth],
        _multipart: [Multipart],
        _redirect: [Redirect],
        _tunnel: [Tunnel],
        _rp_resolve: [Function],
        _rp_reject: [Function],
        _rp_promise: [Promise],
        _rp_callbackOrig: undefined,
        callback: [Function],
        _rp_options: [Object],
        setHeader: [Function],
        hasHeader: [Function],
        getHeader: [Function],
        removeHeader: [Function],
        localAddress: undefined,
        pool: {},
        dests: [],
        __isRequestRequest: true,
        _callback: [Function: RP$callback],
        proxy: null,
        tunnel: true,
        setHost: false,
        originalCookieHeader: 'captcha=1',
        _jar: [RequestJar],
        port: 443,
        host: 'revadike.ga',
        path: '/',
        httpModule: [Object],
        agentClass: [Function: Agent],
        agent: [Agent],
        cloudscraper: true,
        _started: true,
        href: 'https://revadike.ga/',
        req: [ClientRequest],
        ntick: true,
        response: [Circular],
        originalHost: 'revadike.ga',
        originalHostHeaderName: 'Host',
        responseContent: [Circular],
        _destdata: true,
        _ended: true,
        _callbackCalled: true },
     toJSON: [Function: responseToJSON],
     caseless: Caseless { dict: [Object] },
     body:
      <Buffer 3c 21 44 4f 43 54 59 50 45 20 68 74 6d 6c 3e 0a 3c 21 2d 2d 5b 69 66 20 6c 74 20 49 45 20 37 5d 3e 20 3c 68 74 6d 6c 20 63 6c 61 73 73 3d 22 6e 6f 2d ... 10951 more bytes>,
     responseStartTime: 1555359571424,
     isCloudflare: true,
     isHTML: true,
     isCaptcha: true,
     challengeForm:
      '\n  <input type="hidden" name="s" value="d4476e9db638e3ac6423c3b230f5012f6b0f85a1-1555359564-1800-AQilGMdSzDeRH4slsuz5SY9TCvBkuEs7vHGbDdfAM+1AE7RaQ9DjbHMXFgAXbmLC1JPpqKDQnakhLNT+xta5W4e4hz33CDLCuld4t842s7nFyhRb+GCIBNpwk5hJ3CHMWgx8+y5x2ESElJOLmqHTrcI="></input>\n  <script type="text/javascript" src="/cdn-cgi/scripts/cf.challenge.js" data-type="normal"  data-ray="4c809d3e6f5b2bdc" async data-sitekey="6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0"></script>\n  <div class="g-recaptcha"></div>\n  <noscript id="cf-captcha-bookmark" class="cf-captcha-info">\n    <div><div style="width: 302px">\n      <div>\n        <iframe src="https://www.google.com/recaptcha/api/fallback?k=6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0" frameborder="0" scrolling="no" style="width: 302px; height:422px; border-style: none;"></iframe>\n      </div>\n      <div style="width: 300px; border-style: none; bottom: 12px; left: 25px; margin: 0px; padding: 0px; right: 25px; background: #f9f9f9; border: 1px solid #c1c1c1; border-radius: 3px;">\n        <textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none;"></textarea>\n        <input type="submit" value="Submit"></input>\n      </div>\n    </div></div>\n  </noscript>\n',
     captcha:
      { siteKey: '6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0',
        form: {} } } }
undefined

using this code:

const cloudscraper = require('cloudscraper').defaults({ onCaptcha });
const url = "https://revadike.ga";

cloudscraper.get({ uri: url, headers: { cookie: "captcha=1" } }).catch(console.warn).then(console.log)

function onCaptcha(options, response, body) {
  const captcha = response.captcha;
  // We'll probably need to change the uri here (I see a potential bug)
  // Maybe we'll provide captcha.uri instead
  solveReCAPTCHA(response.request.uri, captcha.siteKey, (error, gRes) => {
    if (error) return void captcha.submit(error);
    captcha.form['g-captcha-response'] = gRes;
    captcha.submit();
  });
}

And yes, I am using your branch. I double checked.

ghost commented 5 years ago

@Revadike It' the regex. Thanks for testing it. I'm going to write tests for it. The code coverage is currently 75% on that branch. Once it's 100% I'll update.

ghost commented 5 years ago

@Revadike I have tests in place now. Everything LGTM. Care to try again?

ghost commented 5 years ago

The tests are solid. The code is good. API... umm... I'm okay with it. If you've been waiting on this, get it while it's hot: #198 (v4.0.0 still being reviewed but gtg) :rocket:

Revadike commented 5 years ago

Your example code had some mistakes:

response.request.uri should be response.request.uri.href
captcha.form['g-captcha-response'] should be captcha.form['g-recaptcha-response']

After fixing that, your latest branch worked!

Revadike commented 5 years ago

How does if (error) return void captcha.submit(error); work? Why submit an error and why use void?

ghost commented 5 years ago

Awesome, thanks for letting us know.

How does if (error) return void captcha.submit(error); work? Why submit an error and why use void?

So if you have a default handler, the error propagates back to the caller. void has been in JS since the first version of the language. I'm using it there for readability (It doesn't return a value) and it can be used as a safeguard. It ensures that the return value is undefined.

ghost commented 5 years ago

An example:


const cloudscraper = require('cloudscraper').defaults({ onCaptcha });
cloudscraper.get(uri).catch(error => {
  // If the anti-captcha service had an error, I can handle it here.
  console.log(error):
});

ghost commented 5 years ago

@Revadike did that clear it up?

I don't remember if I mentioned it or not but you can use promises instead of calling captcha.submit. You would just return a promise in onCaptcha. If it resolves, it's the equivalent of calling captcha.submit. If it rejects, it's the equivalent of calling captcha.submit(error).

codemanki / cloudscraper

Add support for recaptcha #62