Athlon1600 / SerpScraper

:mag_right: Google/Bing search results scraping using PHP. Tested and working / May 13, 2020
91 stars 37 forks source link

curl folder not found #32

Open nsetiono opened 4 years ago

nsetiono commented 4 years ago

Fatal error: Uncaught Error: Class 'Curl\BrowserClient' not found in D:\htdocs\SerpScraper\Browser.php:7 Stack trace:

Athlon1600 commented 4 years ago

are you sure you're using the latest version of serp-scraper? php-curl-client dependency was added in version 3. Do 'composer update'

nsetiono commented 4 years ago

already do composer update and the error has changed into this one ...

PS D:\htdocs> php D:\htdocs\gscraper.php PHP Warning: require(D:\htdocs\vendor\composer/../athlon1600/serpscraper/src/dbc/deathbycaptcha.php): failed to open st ream: No such file or directory in D:\htdocs\vendor\composer\autoload_real.php on line 69

Warning: require(D:\htdocs\vendor\composer/../athlon1600/serpscraper/src/dbc/deathbycaptcha.php): failed to open stream: No such file or directory in D:\htdocs\vendor\composer\autoload_real.php on line 69 PHP Fatal error: require(): Failed opening required 'D:\htdocs\vendor\composer/../athlon1600/serpscraper/src/dbc/deathb ycaptcha.php' (include_path='C:\xampp\php\PEAR') in D:\htdocs\vendor\composer\autoload_real.php on line 69

Fatal error: require(): Failed opening required 'D:\htdocs\vendor\composer/../athlon1600/serpscraper/src/dbc/deathbycapt cha.php' (include_path='C:\xampp\php\PEAR') in D:\htdocs\vendor\composer\autoload_real.php on line 69

Athlon1600 commented 4 years ago

okay, plan B: Remove serpscraper and reinstall it again.

go to D:\htdocs\vendor\athlon1600
delete serpscraper directory.

Then redo either composer require athlon1600/serpscraper "^3.0" or just composer install

nsetiono commented 4 years ago

Warning: require(D:\htdocs\vendor\composer/../athlon1600/serpscraper/src/dbc/deathbycaptcha.php): failed to open stream: No such file or directory in D:\htdocs\vendor\composer\autoload_real.php on line 69 <<< the error is still same

it's seem the file needed are not being downloaded by composer

Athlon1600 commented 4 years ago

Try:
composer dump-autoload

if that does not work, then take the nuclear option and just delete entire vendor folder and reinstall everything all over with composer install

nsetiono commented 4 years ago

Hello,

I had trying all of your suggestion but the error messages is still same It's seem the file needed not downloaded by composer

Any suggestion how to get the file needed ?

Athlon1600 commented 4 years ago

I mean you can get the file manually from v2.0 branch here: https://github.com/Athlon1600/SerpScraper/blob/2.x/src/dbc/deathbycaptcha.php

but the latest version of serp scraper (v3.0) does not depend on that file in any way, so it's very weird that composer still complaining about this. Double check for sure that you are using the LATEST version.

Any chance of you zipping that whole htdocs directory with vendor folder and sending it to me?

nsetiono commented 4 years ago

how to send it to you ? i just following the instruction you had given on the main page but until now it's still requiring deathbycaptcha.php file lol

Athlon1600 commented 4 years ago

https://wetransfer.com/

and send it to: info@proxynova.org

nsetiono commented 4 years ago

done bro, file sent

Athlon1600 commented 4 years ago

where is your composer.json and composer.lock files?

image

nsetiono commented 4 years ago

ups i'm missing adding that files, wait re-send the files

anyway when i try to delete vendor folder and do reinstall, it's showing me this ...

PS D:\htdocs> composer require athlon1600/serpscraper "^3.0" ./composer.json has been updated Loading composer repositories with package information Updating dependencies (including require-dev) Package operations: 14 installs, 0 updates, 0 removals

'git' is not recognized as an internal or external command, operable program or batch file.

Now trying to download from dist
Athlon1600 commented 4 years ago

oh wow so apparently version 3.0.0 is broken, because if you tell your composer to install the latest from version 3 branch (v3.0.1 for example), it works just fine... :sleeping:

image

so just add a ^ in front of 3.0 and then do composer update:

image

it works and you don't get any errors about missing deathbycaptcha.php files... I will fix that version 3.0.0 when I can, but hopefully everyone is installing latest version.

nsetiono commented 4 years ago

the error is changed into this one ...

PS D:\htdocs> php .\gscraper.php PHP Fatal error: Uncaught Error: Class 'SerpScraper\GoogleCaptchaSolver' not found in D:\htdocs\gscraper.php:29 Stack trace:

0 {main}

thrown in D:\htdocs\gscraper.php on line 29

Fatal error: Uncaught Error: Class 'SerpScraper\GoogleCaptchaSolver' not found in D:\htdocs\gscraper.php:29 Stack trace:

0 {main}

thrown in D:\htdocs\gscraper.php on line 29

nsetiono commented 4 years ago

I got it to work but there is another error that appear It's seem the response->error condition not working because every proxy i had trying to check showing all OK.

It's better to use the http code 429 for checking if the proxy showing captcha or not when being used but when i try to use $temp-status, it show me this error messages ...

Notice: Trying to get property 'status' of non-object in D:\htdocs\gscraper.php on line 35 OK. PHP Notice: Undefined variable: temp in D:\htdocs\gscraper.php on line 35

Athlon1600 commented 4 years ago

the error is changed into this one ...

PS D:\htdocs> php .\gscraper.php PHP Fatal error: Uncaught Error: Class 'SerpScraper\GoogleCaptchaSolver' not found in D:\htdocs\gscraper.php:29 Stack trace:

0 {main}

thrown in D:\htdocs\gscraper.php on line 29

Fatal error: Uncaught Error: Class 'SerpScraper\GoogleCaptchaSolver' not found in D:\htdocs\gscraper.php:29 Stack trace:

0 {main}

thrown in D:\htdocs\gscraper.php on line 29

yeah remove those two packages because they are already included with serpscraper: image

and do composer update again.

I got it to work but there is another error that appear It's seem the response->error condition not working because every proxy i had trying to check showing all OK.

It's better to use the http code 429 for checking if the proxy showing captcha or not when being used but when i try to use $temp-status, it show me this error messages ...

Notice: Trying to get property 'status' of non-object in D:\htdocs\gscraper.php on line 35 OK. PHP Notice: Undefined variable: temp in D:\htdocs\gscraper.php on line 35

429 IS the code that is being checked against to detect captcha: https://github.com/Athlon1600/SerpScraper/blob/master/src/Engine/GoogleSearch.php#L151

also don't know what line 35 looks like for you there. Do composer update

nsetiono commented 4 years ago

I had fixing the error and adding code to detect if proxy having username and password I'm adding below code to your code to make it work for my need ...

`// explode the proxy to find out if there is username and password // usually there is 4 array_count if there is username and password $exproxy = explode(":", $proxy); //print_r($exproxy); if (count($exproxy) == 2) { echo "Testing ".$proxy." "; $browser = $google->getBrowser(); $browser->setProxy($proxy); }

if (count($exproxy) == 4)
{
    $proxy = $exproxy[0].":".$exproxy[1];
    $proxyauth = $exproxy[2].":".$exproxy[3];
    echo "Testing ".$proxy." ";
    $browser = $google->getBrowser();
    $browser->setProxy($proxy);
    $browser->setProxyAuth($proxyauth);
}`

Every thing is working fine but it's seem the connection to 2captcha api is not working well and i dont know why, need to debug more

PS D:\htdocs> php .\gscraper.php Testing 78.141.214.27:8000 Connection timed out after 10010 millisecondsOK Testing 163.153.220.170:8080 Captcha detected for proxy 163.153.220.170:808 Solving captcha has failed... Testing 35.232.175.142:8080 Connection timed out after 10009 millisecondsOK

nsetiono commented 4 years ago

i think i know why it got error when trying to solve captcha, seem related with the code below ...

// Are we using a proxy? if ($this->proxy) { $request_data['proxy'] = $this->proxy; $request_data['proxytype'] = 'HTTP'; }

i will try to modifying your code to checking proxy type and if the proxy having username and password before sending it to 2captcha, hopefully it will working fine after adding additional code into it :)

nsetiono commented 4 years ago

when i try to var_dump $temp variable, it show me this ..

Testing 167.99.195.184:5050 Captcha detected for proxy 167.99.195.184:5050 Curl\Response Object ( [status] => 429 [body] => <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

https://www.google.co.id/search?q=famous+people+born+in+1734&start=0&client=navclient&gbv=1&comp lete=0&num=100&pws=0&nfpr=1&ie=utf-8&oe=utf-8



About this page

Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you se nding the requests, and not a robot. Wh y did this happen?

Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you se nding the requests, and not a robot. Wh y did this happen?

IP address: 45.147.28.71
Time: 2020-07-22T13:59:53Z
URL: https://www.google.co.id/search?q=famous+people+born+in+1 734&start=0&client=navclient&gbv=1&complete=0&num=100&pws=0&nfpr=1&ie=utf-8&oe=utf-8
[error] =>
[info] => Curl\CurlInfo Object
    (
        [info:protected] => Array
            (
                [url] => https://www.google.com/so
                [content_type] => text/html
                [http_code] => 429
                [header_size] => 503
                [request_size] => 518
                [filetime] => -1
                [ssl_verify_result] => 0
                [redirect_count] => 0
                [total_time] => 0.421849
                [namelookup_time] => 9.6E-5
                [connect_time] => 0.074496
                [pretransfer_time] => 0.31043
                [size_upload] => 265
                [size_download] => 3261
                [speed_download] => 7745
                [speed_upload] => 629
                [download_content_length] => 3261
                [upload_content_length] => 265
                [starttransfer_time] => 0.310452
                [redirect_time] => 0
                [redirect_url] =>
                [primary_ip] => 167.99.195.184
                [certinfo] => Array
                    (
                    )

                [primary_port] => 5050
                [local_ip] => 156.96.117.54
                [local_port] => 61761
                [http_version] => 3
                [protocol] => 2
                [ssl_verifyresult] => 0
                [scheme] => HTTPS
                [appconnect_time_us] => 310253
                [connect_time_us] => 74496
                [namelookup_time_us] => 96
                [pretransfer_time_us] => 310430
                [redirect_time_us] => 0
                [starttransfer_time_us] => 310452
                [total_time_us] => 421849
            )

    )

) Solving captcha has failed...

I'm changing sitekey to data-sitekey at file Utils.php but still failed when trying to solving captcha

Athlon1600 commented 4 years ago

is this only an issue with proxies that need username & password to authenticate or ALL proxies even public ones? Because I just tested this using a plain public proxy and it works just fine?

nsetiono commented 4 years ago

the plain proxy also not working on my side, it seem captcha solving progress not work, even my balance on 2captcha not changed lol

or do i made a mistake on the configuration ?

Athlon1600 commented 4 years ago

or do i made a mistake on the configuration ?

no idea. Send me your full code again, so I can look at it.

nsetiono commented 4 years ago

file sent to your email

nsetiono commented 4 years ago

I'm trying to var_dump $solution variable but it return empty value weird if on your side can work properly

Athlon1600 commented 4 years ago

this has now become a separate issue caused by broken php-captcha-solver package, which stopped working because google has updated the way they do their captchas:
https://2captcha.com/blog/google-search-recaptcha
https://2captcha.com/blog/update-google-recaptcha

https://github.com/Athlon1600/php-captcha-solver

... which will take a while to fix it. Give me a couple days and this should be back to working

nsetiono commented 4 years ago

ok i will wait :)

Athlon1600 commented 4 years ago

it's finally been fixed: https://github.com/Athlon1600/SerpScraper

update your SerpScraper version to 4.0 and you should be good.

nsetiono commented 4 years ago

it can work for multiple proxy now or only 1 proxy ?

Athlon1600 commented 4 years ago

what do you mean by multiple proxy? Your Browser instance that you would create here: image

can only use proxy at a time?

nsetiono commented 4 years ago

ah ok so it's still like the old version by the way is your code using random user agent ?

Athlon1600 commented 4 years ago

by the way is your code using random user agent ?

Nope, it's using this:
https://github.com/Athlon1600/php-curl-client/blob/master/src/BrowserClient.php#L12

You can change that of course, by creating your own MySpecialCustomBrowser class that extends this base class here:
https://github.com/Athlon1600/SerpScraper/blob/master/src/Browser.php

and adding whatever custom headers you want.

nsetiono commented 4 years ago

ah ok, tomorrow i will try your new version :)

nsetiono commented 4 years ago

still not work bro, better I'm using XEvil to resolving the recaptcha v2 :)

PS D:\htdocs> php .\gscraper.php Testing 207.148.74.54:31337 working Testing 77.68.77.181:80 Connection timed out after 10001 milliseconds Testing 66.119.99.22:80 Operation timed out after 10014 milliseconds with 0 out of 0 bytes received Testing 167.99.195.184:5050 Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed... Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed... Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed... Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed... Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed... Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed... Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed... Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed... Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed... Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed... Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed... Captcha detected for proxy 167.99.195.184:5050 trying to solving captcha Solving captcha has failed...

nsetiono commented 4 years ago

Hello Hudson, we got a problem here ...

object(Curl\Response)#11 (4) { ["status"]=> int(0) ["body"]=> bool(false) ["error"]=> string(63) "SSL certificate problem: unable to get local issuer certificate" ["info"]=> object(Curl\CurlInfo)#12 (1) { ["info":protected]=> array(37) { ["url"]=> string(27) "https://2captcha.com/in.php"

Athlon1600 commented 4 years ago

Hello Hudson, we got a problem here ...

object(Curl\Response)#11 (4) { ["status"]=> int(0) ["body"]=> bool(false) ["error"]=> string(63) "SSL certificate problem: unable to get local issuer certificate" ["info"]=> object(Curl\CurlInfo)#12 (1) { ["info":protected]=> array(37) { ["url"]=> string(27) "https://2captcha.com/in.php"

This is a problem with your system, not this script. But this is a very common problem, so you should be able to solve this by following these steps here:
https://github.com/Athlon1600/useful#php--ssl

Athlon1600 commented 4 years ago

Are you sure you're using the latest version of SerpScraper? Because sometimes I do too get "Solving Captcha has failed" error, but that's due to 2captcha.com itself sending me back ERROR_CAPTCHA_UNSOLVABLE. But if you retry solving it again, it should solves just fine. Proof:

image

nsetiono commented 4 years ago

yep, i'm using the latest version of your codes i had use CURLOPT_CAINFO => 'D:\htdocs\cacert\cacert.pem', and CURLOPT_CAPATH => 'D:\htdocs\cacert\cacert.pem' to make the error about SSL disappear but there is another error message which seem come from PHP 7 ...

Warning: curl_setopt_array(): You must pass either an object or an array with the CURLOPT_HTTP200ALIASES argument in D:\ htdocs\vendor\athlon1600\php-curl-client\src\Client.php on line 63

I had trying to search on google regarding this but got no success since there is not much code example regarding CURLOPT_HTTP200ALIASES

Maybe you got any clue regarding CURLOPT_HTTP200ALIASES

nsetiono commented 4 years ago

ignore my last comment, the main problem seem still occurred about SSL which is SSL certificate problem: unable to get local issuer certificate

I had setup everything both in php.ini and curl but the error still shown

Captcha detected for proxy 207.148.74.54:31337 trying to solving captcha. object(Curl\Response)#14 (4) { ["status"]=> int(0) ["body"]=> bool(false) ["error"]=> string(63) "SSL certificate problem: unable to get local issuer certificate" ["info"]=> object(Curl\CurlInfo)#15 (1) { ["info":protected]=> array(37) { ["url"]=> string(27) "https://2captcha.com/in.php" ["content_type"]=> NULL ["http_code"]=> int(0) ["header_size"]=> int(0) ["request_size"]=> int(0) ["filetime"]=> int(-1) ["ssl_verify_result"]=> int(20) ["redirect_count"]=> int(0) ["total_time"]=> float(0.10792) ["namelookup_time"]=> float(0.001407) ["connect_time"]=> float(0.038352) ["pretransfer_time"]=> float(0) ["size_upload"]=> float(0) ["size_download"]=> float(0) ["speed_download"]=> float(0) ["speed_upload"]=> float(0) ["download_content_length"]=> float(-1) ["upload_content_length"]=> float(-1) ["starttransfer_time"]=> float(0) ["redirect_time"]=> float(0) ["redirect_url"]=> string(0) "" ["primary_ip"]=> string(13) "74.84.150.210" ["certinfo"]=> array(0) { } ["primary_port"]=> int(443) ["local_ip"]=> string(13) "x.x.x.x" ["local_port"]=> int(43197) ["http_version"]=> int(0) ["protocol"]=> int(2) ["ssl_verifyresult"]=> int(0) ["scheme"]=> string(5) "HTTPS" ["appconnect_time_us"]=> int(0) ["connect_time_us"]=> int(38352) ["namelookup_time_us"]=> int(1407) ["pretransfer_time_us"]=> int(0) ["redirect_time_us"]=> int(0) ["starttransfer_time_us"]=> int(0) ["total_time_us"]=> int(107920) } } }

Athlon1600 commented 4 years ago

the only thing you should modifying is your php.ini file.
https://stackoverflow.com/questions/24611640/curl-60-ssl-certificate-problem-unable-to-get-local-issuer-certificate

there's something you haven't set up correctly I'm sure, cause otherwise you wouldn't be getting that error.

nsetiono commented 4 years ago

well, I had putting the path on php.ini even had put it to the curl option but the error still showing weird, what make it more weird, my server will hang up when running the apache from the xampp i had installed before lol

nsetiono commented 4 years ago

ok, i had make it to work 👍 thank you so much for your response bro, if you need the modified codes, just ask and i will send it right away to you :)

nsetiono commented 4 years ago

anyway, I had adding the codes below to limiting captcha solving loop

if ($i<=10) { if (is_array($temp)) { if ($temp->status == 200) { echo "Captcha solved successfully!" . PHP_EOL; break 1; } else { echo 'Solving captcha has failed...' . PHP_EOL; } } elseif ($temp == "UNSOLVABLE") { echo 'UNSOLVABLE captcha...' . PHP_EOL; break 1; } } else { // after 10 attempt we will set solving captcha progress is invalid to prevent endless looping for captcha solving echo $i.' attempt to solving captcha has failed, move to next proxy...' . PHP_EOL; break 1; }