drawrowfly / tiktok-scraper

TikTok Scraper. Download video posts, collect user/trend/hashtag/music feed metadata, sign URL and etc.
4.36k stars 797 forks source link

Can't extract Tac value #160

Closed vsyw closed 4 years ago

vsyw commented 4 years ago

Error message: "Can't extract Tac value" when use

 TikTokScraper.user('USERNAME', { number: 100 });
drawrowfly commented 4 years ago

well they've probably changed signature method

vsyw commented 4 years ago

Yes, I think so. Do you have any idea on this? thank you

thibaultleouay commented 4 years ago

have a look at this: https://sf16-muse-va.ibytedtos.com/obj/rc-web-sdk-gcs/acrawler.js and this

        window.byted_acrawler.init({
            aid: 1988,
            dfp: false,
            boe: false,
            intercept: true,
            enablePathList: ["/api/user/detail", "/api/music/detail", "/api/item/detail", "/api/challenge/detail/", "/share/item/list", "/api/item_list/", "/api/comment/list/", "/api/comment/list/reply/", "/api/discover/user/", "/api/commit/follow/user/", "/api/recommend/user/", "/api/impression/write/", "/share/item/explore/list", "/api/commit/item/digg/", "/node/share/*", "/discover/render/*"],
        });
vsyw commented 4 years ago

Hi @thibaultleouay, thank you for sharing. But how exactly to use it?

drawrowfly commented 4 years ago

They just moved away tac signature

thibaultleouay commented 4 years ago

They are using acrawler.js to generate the signature

it might be similar to what they are doing on douyin see

https://github.com/elvisyjlin/media-scraper/issues/9

drawrowfly commented 4 years ago

Brrrr. old info, since 2019 lots of things changed. Please do not rush to post any fast conclusions

ThibaultJanBeyer commented 4 years ago
  1. What does "Tac" mean? ~2. By "they" you mean Tiktok?~ probably yes ~3. And that means the scraper is broken because TikTok changed something?~ probably yes
  2. Are you working on this?

:)

ThibaultJanBeyer commented 4 years ago

I had a quick look but as I've no clue what Tac is I don't think I can help :( This code looks very cryptic to me. I suppose you're somehow faking an authentification or something?

aymather commented 4 years ago

Also experiencing this issue. Only getting it when scraping info on posts though, still able to get a user's profile information... I'm assuming they just operate differently?

sharif9876 commented 4 years ago

This guy seems to have a solution that works: https://github.com/carcabot/tiktok-signature

Updated 4 hours ago. Might be worth taking a look at the changes he just made

drawrowfly commented 4 years ago

@ThibaultJanBeyer Tac value served as a "secret/key" in signature generating process. Before it was rendered in the web page it self. @thibaultleouay is right about the way signature is being generated right now. I have implemented it from mine side, but not luck yet, signature is bad

I also have was able to generate valid tac but it only works with the new endpoints

@sharif9876 is using puppeteer and that is way too "heavy". This is the "last resort" solution

sharif9876 commented 4 years ago

@drawrowfly Not sure if this helps you, but if you inspect TikTok's webpage, they call this file: https://sf16-muse-va.ibytedtos.com/obj/rc-web-sdk-gcs/acrawler.js

If you run that code in your own project, it exposes a new byted_acrawler.sign() method that you can pass the URL to. That is what the tiktok-signature package is using to generate a signature.

He is using puppeteer but his signature generation code is still valid here: https://github.com/carcabot/tiktok-signature/blob/master/index.js#L82

drawrowfly commented 4 years ago

"I have implemented it from mine side, but not luck yet, signature is bad"

still working on it

thibaultleouay commented 4 years ago

I'm trying to inject the new script with jsdom on my side too.

the other TikTok signature repo seems to also have an issue with the old endpoint too

https://github.com/carcabot/tiktok-signature/issues/52

drawrowfly commented 4 years ago

Patience is very important. Love challenges :)

leehanse commented 4 years ago

thanks, wait for resolve this issue

charlyBerthet commented 4 years ago

Hi folks!! I am also investigating. Do you guys know what are these red dot characters in acrawler.js script ?

Capture d’écran 2020-06-05 à 10 30 07
subokita commented 4 years ago

Hi folks!! I am also investigating. Do you guys know what are these red dot characters in acrawler.js script ?

Capture d’écran 2020-06-05 à 10 30 07

Seems to be ASCII code 16 (Data Line Escape) Probably just extra space the obfuscator add into / shrunk into, shouldn't have any impact on logic of the code.

szokeptr commented 4 years ago

I have a solution that allows calling byted_acrawler.sign() in Node environment natively, without puppeteer. Let me know if that helps.

drawrowfly commented 4 years ago

I did this yesterday through jsdom, but signature is invalid

drawrowfly commented 4 years ago

And if someone will have a solution to any problem, do not ask if we need it. Test it and post the solution here or submit pull request !

szokeptr commented 4 years ago

Sorry, I just started working with TikTok yesterday, I don't know anything about it and I am just working out of experience with Instagram. I didn't know if you already know that solution (but don't like it or whatever) so I though I'd ask before posting.

So my solution is like this: I took acrawler.js and executed it in Node.js, which showed me that the script populates the global scope with 3 properties: TAC, oprand and byted_acrawler:

szokeptr@Peters-Mac-Pro /tmp
  % node ./acrawler.js                                                                                                                                                                                                                                             !1847
[
  'global',
  'clearInterval',
  'clearTimeout',
  'setInterval',
  'setTimeout',
  'queueMicrotask',
  'clearImmediate',
  'setImmediate',
  'TAC',
  'oprand',
  'byted_acrawler'
]

The byted_acrawler object has a method sign that will error if called in node environment since it relies on some browser-only APIs like navigator and location. If you manually provide these, the method can be called and a correct signature is made:

… contents of acrawler.js …

// Populate the global scope with missing APIs
location = {
    href: 'https://www.tiktok.com/',
    protocol: 'https:'
};

navigator = {
    userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36'
};

function signUrl(url) {
        return byted_acrawler.sign({ url });
}

I tested this with the following endpoints:

Forgot to mention: the signed request should have the same UA as the one used to sign the request, no other headers are required (only UA).

drawrowfly commented 4 years ago

The signature won't be valid

Actually it is valid in some way

szokeptr commented 4 years ago

The server accepts it with the endpoints I listed above.

drawrowfly commented 4 years ago

I've tested with the old api. It actually accepts 1 request out of 3 . Weird. Anyway i will update repo shortly

EmilioBarradas commented 4 years ago

@szokeptr Although your solution works for some endpoints, other endpoints require more complex properties (createElement, HTMLCanvasElement, GPUINFO, plugins) in order to generate a signature; you would need to implement those properties into the global scope as well, which is a hassle. It seems like some jsdom implementation is the solution to this, but I have not been able to generate any valid signatures for an endpoint other than the trending page.

drawrowfly commented 4 years ago

i will push working version within few hours

szokeptr commented 4 years ago

@EmilioBarradas I only need the two mentioned endpoints at this moment, but I will continue investigating since it was a lot of fun 😃

areltiyan commented 4 years ago

well working in new version ! wow @drawrowfly

drawrowfly commented 4 years ago

@szokeptr the only thing i have learned from the tiktok that there are to many bugs and daily updates, that mess-up with all tests. If you didn't mentioned that your tests are working , i would've abandon this way as it didn't worked for me before!!!

Cheers!

PS

New update was pushed But there is But: 1.it is a bit ugly code for now 2.Tests require update 3.Scraping data from the hashtag feed won't be stable as i wasn't able to find the new endpoint for the hashtag feed as i did with the music feed for example

  1. If someone will create the issue regarding hashtag feed, please point them to this message :)
charlyBerthet commented 4 years ago

@szokeptr I am wondering, how did you figure out that global vars location and navigator were needed?

drawrowfly commented 4 years ago

This is similar to previous signature method . They using user agent to bind signature to the browser

szokeptr commented 4 years ago

@charlyBerthet since I knew that the sign method worked perfectly in browser but not in Node env, I created recursive mocks of the browser APIs in the global scope using the awesome JS Proxy and logged all property access attempts.

szokeptr commented 4 years ago

@drawrowfly I see that you implemented something similar to what I mentioned. Does that mean that this works for more than those 2 endpoints? (I don’t actively use your module, so I am not sure how everything works)

EmilioBarradas commented 4 years ago

@szokeptr Just tested with '/api/user/detail/' and '/api/music/detail/' and it doesn't seem like it does :(

Edit: For context, this is what I am using: test.js

drawrowfly commented 4 years ago

@szokeptr

Basically all Web API endpoints that require the signature

Old API endpoints can be signed but it won't work the way you expect it to , it will randomly return empty response, it means that something is wrong from their side :)

EmilioBarradas commented 4 years ago

@drawrowfly What are the new endpoints?

ThibaultJanBeyer commented 4 years ago

Old API endpoints can be signed but it won't work the way you expect it to , it will randomly return empty response, it means that something is wrong from their side :)

So TikTok has a bug in their code now and you expect it to be resolved soon? :D How do you know it's not some anti-scraping shit?

drawrowfly commented 4 years ago

The only "anti-scraping shit" is captcha

And scraper is working fine expect hashtag feed!

drawrowfly commented 4 years ago

And it's not a bug, it's "TikTok" :) . There was lots of times when API behavior was random

Every few weeks they update UI and that can affect the API

drawrowfly commented 4 years ago

@EmilioBarradas just the usual stuff, the ones that are used right now in TikTok Web, you can explore them through inspector

tarikhagustia commented 4 years ago

u're awesome bro @drawrowfly ..

ThibaultJanBeyer commented 4 years ago

I see thanks fo the clarification, great work! :)

youngvo commented 4 years ago

@drawrowfly Thanks for your great tool.

And scraper is working fine expect hashtag feed!

So did you find any hints to configure the hashtag feed?

geco commented 4 years ago

@drawrowfly Thanks for your great tool.

And scraper is working fine expect hashtag feed!

So did you find any hints to configure the hashtag feed?

Which is the more similar invocation method to scrape generic posts? Trending? Thank you in advance

youngvo commented 4 years ago

Which is the more similar invocation method to scrape generic posts? Trending? Thank you in advance

@geco what do you mean? I can't get you on this. Please explain more. My questions is that at the moment the search for hashtag is not stable. We need to scrape 3 times at least or more to fetch the list of media objects tagged with a specify hashtag.

geco commented 4 years ago

I see. There is way using "search for hashtag" (3 times or what we need to have results each), but having new results at every invocation? Now i continue getting always same results.