RobertGrubb / tiktok-php

TikTok scraper in PHP
MIT License
63 stars 21 forks source link

How to resolve this ? #40

Open isoraw-1 opened 3 years ago

isoraw-1 commented 3 years ago

Array ( [error] => 1 [message] => Unable to retrieve NEXT_DATA from DOM )

tegarkurniawan commented 3 years ago

same issue

RobertGrubb commented 3 years ago

@isoraw-1 @tegarkurniawan Will be looking into this soon. Apologies for the delay, haven't had the chance to look into this.

andersonrobert1313 commented 3 years ago

Hi @RobertGrubb waiting from last 8 days for your reply?

When can you provide the solution? Thanks

SlavaPWNZ commented 3 years ago

Guys, when you get __NEXT_DATA__, add this regexp in script nonce=\"(.*?)\"

isoraw-1 commented 3 years ago

Guys, when you get NEXT_DATA, add this regexp in script nonce="(.*?)"

Still getting same issue : Array ( [error] => 1 [message] => Unable to retrieve NEXT_DATA from DOM )

MShoaibAkram commented 3 years ago

Same issue is it resolved...?

SlavaPWNZ commented 3 years ago

Same issue is it resolved...?

look at my answer dude... already resolve this do this:

if (preg_match_all('#\<script id=\"__NEXT_DATA__\" type=\"application/json\" nonce=\"(.?)\" crossorigin=\"anonymous\">(.?)\<\/script>#', $this->data, $out)) { return json_decode($out[2][0], true, 512, JSON_BIGINT_AS_STRING); }

need add "nonce" regexp

SlavaPWNZ commented 3 years ago

Guys, when you get NEXT_DATA, add this regexp in script nonce="(.*?)"

Still getting same issue : Array ( [error] => 1 [message] => Unable to retrieve NEXT_DATA from DOM )

u get $out[2]?

look at script before this msg

MShoaibAkram commented 3 years ago

Yup got it.. but the issue is some where above .. I have debug the code and in UserRequest-> details method $nextData = $this->instance->request->call($endpoint);//->extract(); return $nextData; I am getting following response {"config":{"signMethod":"node","datafetchApiKey":"", "userAgent":"","timeout":20,"disableCookies":true},"data":"","cookies":[],"cookieJar":{}}

SlavaPWNZ commented 3 years ago

ну

Yup got it.. but the issue is some where above .. I have debug the code and in UserRequest-> details method $nextData = $this->instance->request->call($endpoint);//->extract(); return $nextData; I am getting following response {"config":{"signMethod":"node","datafetchApiKey":"", "userAgent":"","timeout":20,"disableCookies":true},"data":"","cookies":[],"cookieJar":{}}

yes, because this code need put in EXTRACT() u need create new func, and call this

SlavaPWNZ commented 3 years ago

Снимок экрана от 2021-08-26 15-05-09

MShoaibAkram commented 3 years ago

Thanks for helping.. I got same issue after uncommenting extract method {"error":true,"type":"EMPTY_RESPONSE","message":"Empty Response"} ... :(

SlavaPWNZ commented 3 years ago

EMPTY_RESPONSE

rewrite error message message":"Empty Response1" message":"Empty Response2" and u can find problem

for me, scrapper not working with cookies, im work without cookie

SlavaPWNZ commented 3 years ago

Thanks for helping.. I got same issue after uncommenting extract method {"error":true,"type":"EMPTY_RESPONSE","message":"Empty Response"} ... :(

^

RobertGrubb commented 3 years ago

I'll be honest, it's been a while since I've even even messed with this repo. I will make it a point to spend some time on it later today to see what I can accomplish here.

MShoaibAkram commented 3 years ago

I have also tried with cookies and without cookies same response.. I think may be tiktok has changed its structure or something else that I am getting empty response.. here is my scraper initialisation code `$scraper = new Scraper([

              'signMethod' => 'node',
              'datafetchApiKey' => '*******',
              'userAgent' => '',
              'timeout' => 20,
              'disableCookies' => true
            ]);` 

One thing more.. is datafetchApiKey is necessary for it or.. as I read documentation.. I can get 100 user per 15 minutes without Apikey

MShoaibAkram commented 3 years ago

I'll be honest, it's been a while since I've even even messed with this repo. I will make it a point to spend some time on it later today to see what I can accomplish here.

Thanks man for the clarification... waiting for your next response....:)

SlavaPWNZ commented 3 years ago

I'm use this repo today. But FORK ) 100% stable. Just override 1) extract() 2) call($endpoint, $customHeaders = []) for use curl_error 3) details()

I'm not use datafetchApiKey, only rotating Proxy

MShoaibAkram commented 3 years ago

I'm use this repo today. But FORK ) 100% stable. Just override

  1. extract()
  2. call($endpoint, $customHeaders = []) for use curl_error
  3. details()

I'm not use datafetchApiKey, only rotating Proxy

Can you please share call($endpoint, $customHeader=[]) code and Scraper initilization.. I think the issue is with in this part

MShoaibAkram commented 3 years ago

When I do var_dump the response of curl.. actually lt is returning captcha from tiktok is there any solution for it.. also I coppied cookies into scraper initilizer but its not working..

Screenshot 2021-08-27 at 12 49 46
SlavaPWNZ commented 3 years ago

When I do var_dump the response of curl.. actually lt is returning captcha from tiktok is there any solution for it.. also I coppied cookies into scraper initilizer but its not working..

Screenshot 2021-08-27 at 12 49 46

You need LTE/4G proxy with Rotating IP 1 ip can do ~500 requests

SlavaPWNZ commented 3 years ago

When I do var_dump the response of curl.. actually lt is returning captcha from tiktok is there any solution for it.. also I coppied cookies into scraper initilizer but its not working..

Screenshot 2021-08-27 at 12 49 46

IMG_20210827_112302_050

MShoaibAkram commented 3 years ago

oops .. :).. isn't there anyother solution.... I mean can I buy online proxy or anything else

SlavaPWNZ commented 3 years ago

oops .. :).. isn't there anyother solution.... I mean can I buy online proxy or anything else

yes i can recommend this service for u https://astroproxy.com/r/140345f8d716b04e1a u need Mobile LTE proxy, with rotating by time (~5min, easy way, but not 100% stable) or rotate by URL when u got captcha, u need change proxy, if u buy ~5 proxies, or call this url, wait 30 secs and u can use this IP again

MShoaibAkram commented 3 years ago

oops .. :).. isn't there anyother solution.... I mean can I buy online proxy or anything else

yes i can recommend this service for u https://astroproxy.com/r/140345f8d716b04e1a u need Mobile LTE proxy, with rotating by time (~5min, easy way, but not 100% stable) or rotate by URL when u got captcha, u need change proxy, if u buy ~5 proxies, or call this url, wait 30 secs and u can use this IP again

Bundle of thanks for such a great help..:) stay blessed

MShoaibAkram commented 3 years ago

oops .. :).. isn't there anyother solution.... I mean can I buy online proxy or anything else

yes i can recommend this service for u https://astroproxy.com/r/140345f8d716b04e1a u need Mobile LTE proxy, with rotating by time (~5min, easy way, but not 100% stable) or rotate by URL when u got captcha, u need change proxy, if u buy ~5 proxies, or call this url, wait 30 secs and u can use this IP again

@SlavaPWNZ I have successfully get profile of user. Now the issue is with regular expression can you please help me in it. here is my regex expression

\<script id=\"__NEXT_DATA__\" type=\"application/json\" nonce="(.?)" crossorigin=\"anonymous\">(.*?)\<\/script>

it returns false but __NEXTDATA is present in it....

SlavaPWNZ commented 3 years ago

oops .. :).. isn't there anyother solution.... I mean can I buy online proxy or anything else

yes i can recommend this service for u https://astroproxy.com/r/140345f8d716b04e1a u need Mobile LTE proxy, with rotating by time (~5min, easy way, but not 100% stable) or rotate by URL when u got captcha, u need change proxy, if u buy ~5 proxies, or call this url, wait 30 secs and u can use this IP again

@SlavaPWNZ I have successfully get profile of user. Now the issue is with regular expression can you please help me in it. here is my regex expression

it returns false but __NEXTDATA is present in it....

u using my regexp? yes? look at this, u need get $out[2][0], not $out[1][0] now...

MShoaibAkram commented 3 years ago

Yes I am using your regx but.. the issue is that it is returning false and didn't goes inside if block.. here is my code if (preg_match_all("#\<script id=\"__NEXT_DATA__\" type=\"application/json\" nonce=\"(.?)\" crossorigin=\"anonymous\">(.*?)\<\/script\>#", $this->data, $out)) { return json_decode($out[2][0], true, 512, JSON_BIGINT_AS_STRING); }

MShoaibAkram commented 3 years ago

So for me this regex is not working.. I have changed it with DOM like below and it is working fine.. ` try { $doc = new \DOMDocument(); libxml_use_internal_errors(true); $doc->loadHTML($this->data); // $html from above libxml_use_internal_errors(false);

          $xpath = new \DOMXPath($doc);
          $id = "__NEXT_DATA__";
          foreach ($xpath->query('//script[@id="' . $id . '"]') as $node) {
              return json_decode($node->nodeValue);
          }
      }catch(\Exception $e){
          return (object) [
              'error' => true,
              'type' => 'NO_NEXT_DATA',
              'message' => 'Unable to retrieve NEXT_DATA from DOM'
          ];
      }`

also by using this code you need to change json array to object in User model like below `public function fromNextData ($NEXT_DATA) { $instance = new self();

// Validate the response data

// if (count($NEXT_DATA) === 0) return $this->error('NEXT_DATA__'); if (!isset($NEXT_DATA->props)) return $this->error('NEXT_DATA[props]'); if (!isset($NEXT_DATA->props->pageProps)) return $this->error('No __NEXT_DATA[props][pageProps]'). json_encode($NEXT_DATA);

// If this property is missing, the user does not exist.
if (!isset($NEXT_DATA->props->pageProps->userInfo)) return $this->error('User does not exist', true);

// Set Userdata
$userData = json_decode(json_encode($NEXT_DATA->props->pageProps->userInfo));

// set all keys from userData to the instance.
foreach ($userData as $key => $val) $instance->{$key} = $val;

// Backwards compatible
$instance->userId = $userData->user->id;
$instance->covers = [ $userData->user->avatarThumb ];
$instance->coversMedium = [ $userData->user->avatarMedium ];
$instance->nickName = $userData->user->nickname;
$instance->following = $userData->stats->followingCount;
$instance->fans = $userData->stats->followerCount;
$instance->heart = $userData->stats->heartCount;
$instance->video = $userData->stats->videoCount;
$instance->verified = $userData->user->verified;
$instance->digg = $userData->stats->diggCount;
$instance->signature = $userData->user->signature;
$instance->secUid = $userData->user->secUid;
$instance->uniqueId = $userData->user->uniqueId;
$instance->bioLink = false;

if (isset($userData->user->bioLink)) {
  if (isset($userData->user->bioLink->link)) {
    $instance->bioLink = $userData->user->bioLink->link;
  }
}

return $instance;

}`

please give your suggestion on it..