drawrowfly / tiktok-scraper

TikTok Scraper. Download video posts, collect user/trend/hashtag/music feed metadata, sign URL and etc.
4.36k stars 796 forks source link

Fail to fetch data #437

Closed Tankruslan closed 3 years ago

Tankruslan commented 3 years ago

I'm using .user method with rotating proxies and headers. Everything worked fine until today. But then it made me to solve captcha instead of giving user page.

coder39248583 commented 3 years ago

@Tankruslan

Just the other day I began to experience random failures on the same method.

I noticed TikTok is rolling out changes. My Brazil proxies often bring up a redesigned TikTok page where users posts appear in a twitter-like feed, with a button to toggle back to the traditional grid view.

The attached image shows the new page I get, with the toggled views.

Proxies that don't get this page seem to have no trouble fetching a user's posts.

redesign

obsidianart commented 3 years ago

the layout is controlled by the top right icon, probably if you don't select one it tries to switch them to see which one has a higher engagement for you

coder39248583 commented 3 years ago

@obsidianart The top right icon is new, is what I'm saying. It used to be a grid view only, with no option to change the layout. But profiles are starting to appear with that top right icon, and the new layout option, and this seems to be associated (I think, but can't be certain) with tiktok-scraper's failure to fetch a user's posts using the .user method.

The .user method is still failing randomly for me, One proxy will fail (with no error given), but 10 minutes later the same proxy may succeed, with nothing changed. My failures seem to be limited to foreign proxies -- mostly Brazil, some Netherlands, etc, (but not in the U.S.) -- and when I check TikTok in a browser using those proxies, the new layout option appears. When I check TikTok in a browser using proxies that haven't been failing, the new layout does not appear. I don't know if it's related but it might be.

rizwan-rizu commented 3 years ago

I have integrated this API three weeks ago I guess, it was working fine and giving me the required data that i need but suddenly stopped working and giving me this error Error: Can't extract user metadata from the html page. Make sure that user does exist and try to use proxy

Michaelmala commented 3 years ago

Hi @rizwan-rizu try using the cookie "sessionid_ss" in the sessionList option

rizwan-rizu commented 3 years ago

Hi @Michaelmala thanks for your reply can you please guide me on where should I get this value from? In my case, I need to scrap the data of any public random user and previously I haven't used any options while making API calls using this module TikTokScraper.getUserProfileInfo(username) and it was luckily working fine but suddenly stopped and starts giving error.

Michaelmala commented 3 years ago

Hi @rizwan-rizu, sorry for the late reply. You have to go to a random user TikTok profile from a computer browser and do the following:

  1. Open the inspector;
  2. Select the "Network" tab;
  3. Refresh the page (now you will see all the network requests done from the current page);
  4. Search for one of the requests that his name starts by "?aid=1988&app" and open the details;
  5. Now look at the "Request Headers" and search the "cookie" value;
  6. Search the cookie "sessionid_ss" value;

now make the call to the library like this:

TikTokScraper.getUserProfileInfo(username, { sessionList: ["sessionid_ss=45fa•••••••••••••••••;"] });

the sessionid_ss is relative to your logged TikTok account, so you have to log in TikTok and resolve the captcha requested before to do any of the previous steps. Remember that if you log out of TikTok that cookie value will no longer be valid, so you need to stay logged in to make it work.

rizwan-rizu commented 3 years ago

Thank you @Michaelmala for providing information and taking the time to assist me. I very much appreciate that 🤝 .

rizwan-rizu commented 3 years ago

I need to do it programmatically on my website where the user will be typing a name and I need to show that user's TikTok public profile. Based on your response @Michaelmala, It's not feasible for me and it seems like this scrapper library is of no use to me 😞

Michaelmala commented 3 years ago

@rizwan-rizu you need to run the library on a server, you can set it easily on Firebase with cloud functions, I did it for a react native app. Otherwise you have to rewrite the library to be compatible for non server environment but is very time expensive to do and maybe some utilities will be missed.

rizwan-rizu commented 3 years ago

Sorry @Michaelmala, I didn’t get your comment well as I am already using the library as an npm package in my node app. PS: Currently a user is typing their TikTok name and I am trying to fetch the user’s details via function .getUserProfileInfo(username). Actually, I want the user to connect their TikTok account with the app and they can get their TikTok details listed on our web app.

Michaelmala commented 3 years ago

@rizwan-rizu ahhh, sorry I didn't got you problem. So I suggest to create a TikTok account just for the purpose to log-in in a browser and get the "sessionid_ss" cookie value to store in a db. So in my app I store that cookie value in the Firestore db and I retrieve it when I need to use the library. If you want to use the user's TikTok account to retrieve their "sessionid_ss" I think you have to create an algorithm to let them log-in to TikTok and retrieve the "sessionid_ss" cookie.

quangtuyennguyen commented 3 years ago

@Michaelmala thanks for the helpful answers! In my app I storage three cookies on FireStore, then I provide them in sessionList array. My question is If I keep logged in, Does my cookies be expired?

Michaelmala commented 3 years ago

Hi @quangtuyendev, you are welcome. For now I can't be able to verify it yet but I think yes as in google chrome cookie settings that cookie value has an expiration date of some days. I'm thinking on how to create an algorithm to refresh the "sessionid_ss" cookie value when it will be no longer valid

quangtuyennguyen commented 3 years ago

Yes, @Michaelmala! Do you have use proxies in your app? If so, do they work well?

Michaelmala commented 3 years ago

@quangtuyendev for this library I never used proxies for now but some times ago I used Kyte (before was called Crawlera) that is a very good service, they provide you with a very big amount of proxies. The very good thing of their service is that they automatically provide you a proxy and automatically change it if is banned or blocked, so any call you do with their service will go to success!

quangtuyennguyen commented 3 years ago

@Michaelmala, thank you so much! This answer very helpful to me.

Now I can't use videoUrl in .getVideoMeta. I used Puppeteer to scrape HTML, I got a videoUrl but not working. Can you suggest to me a solution or cheap API service? Thank you so much!

Michaelmala commented 3 years ago

@quangtuyendev i found the problem right now!

the .getVideoMeta() method doesn't provide the option to set custom cookies, so the videoUrl property will never be correct because to get the right url you have to use the "tt_webid_v2" cookie value.

you can get a working videoUrl editing the method "getVideoMetadata()" of the library file "TikTok.js" from:

async getVideoMetadata(url = '') { const videoData = /tiktok.com\/(@[\w.-]+)\/video\/(\d+)/.exec(url || this.input); if (videoData) { const videoUsername = videoData[1]; const videoId = videoData[2]; const options = { method: 'GET', uri:https://www.tiktok.com/node/share/video/${videoUsername}/${videoId}, json: true, }; try { const response = await this.request(options); if (response.statusCode === 0) { return response.itemInfo.itemStruct; } } catch (_a) { } } throw new Error(Can't extract video metadata: ${this.input}); }

to:

async getVideoMetadata(url = '') { const videoData = /tiktok.com\/(@[\w.-]+)\/video\/(\d+)/.exec(url || this.input); if (videoData) { const videoUsername = videoData[1]; const videoId = videoData[2]; const options = { method: 'GET', uri:https://www.tiktok.com/node/share/video/${videoUsername}/${videoId}, json: true, headers: { cookie: "tt_webid_v2=yourcookievalue;" } }; try { const response = await this.request(options); if (response.statusCode === 0) { return response.itemInfo.itemStruct; } } catch (_a) { } } throw new Error(Can't extract video metadata: ${this.input}); }

obviously this is a temporary solution, I'm working to make a pull request for a final solution.

quangtuyennguyen commented 3 years ago

@Michaelmala Thanks bro!

Pezhvak commented 3 years ago

@quangtuyendev i found the problem right now!

the .getVideoMeta() method doesn't provide the option to set custom cookies, so the videoUrl property will never be correct because to get the right url you have to use the "tt_webid_v2" cookie value.

you can get a working videoUrl editing the method "getVideoMetadata()" of the library file "TikTok.js" from:

async getVideoMetadata(url = '') { const videoData = /tiktok.com\/(@[\w.-]+)\/video\/(\d+)/.exec(url || this.input); if (videoData) { const videoUsername = videoData[1]; const videoId = videoData[2]; const options = { method: 'GET', uri:https://www.tiktok.com/node/share/video/${videoUsername}/${videoId}`, json: true, }; try { const response = await this.request(options); if (response.statusCode === 0) { return response.itemInfo.itemStruct; } } catch (_a) { } } throw new Error(Can't extract video metadata: ${this.input}); }`

to:

async getVideoMetadata(url = '') { const videoData = /tiktok.com\/(@[\w.-]+)\/video\/(\d+)/.exec(url || this.input); if (videoData) { const videoUsername = videoData[1]; const videoId = videoData[2]; const options = { method: 'GET', uri:https://www.tiktok.com/node/share/video/${videoUsername}/${videoId}`, json: true, headers: { cookie: "tt_webid_v2=yourcookievalue;" } }; try { const response = await this.request(options); if (response.statusCode === 0) { return response.itemInfo.itemStruct; } } catch (_a) { } } throw new Error(Can't extract video metadata: ${this.input}); }`

obviously this is a temporary solution, I'm working to make a pull request for a final solution.

hi, any updates on this PR?

drawrowfly commented 3 years ago

@1.4.33

kcinnxy commented 2 years ago

Did you manage to find a way to refresh this session id cookie?