elvisyjlin / media-scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok
MIT License
385 stars 49 forks source link

How to handle the __signature field in tiktok share link #9

Open liudongdonggoforit opened 5 years ago

elvisyjlin commented 5 years ago

Hi, the _signature field seems to be a new verification mechanism. I'm working on it.

toyinstark commented 5 years ago

Hi, the _signature field seems to be a new verification mechanism. I'm working on it.

Any luck with how the _signature is generated?

Also, How long does it take for a _signature to expire

elvisyjlin commented 5 years ago

If you read Chinese, this article (https://blog.csdn.net/swukong_/article/details/80887940) illustrates how to dig into the douyin signature, which is similar to the tiktok signature. The idea is to find out a javascript code which performs sign(). But I observed that the signature mechanism behaves different from the way the article says. I.e., the javascript seems different.

toyinstark commented 5 years ago

If you read Chinese, this article (https://blog.csdn.net/swukong_/article/details/80887940) illustrates how to dig into the douyin signature, which is similar to the tiktok signature. The idea is to find out a javascript code which performs sign(). But I observed that the signature mechanism behaves different from the way the article says. I.e., the javascript seems different.

I think this solves the problem? https://github.com/loadchange/amemv-crawler/blob/master/fuck-byted-acrawler.js

elvisyjlin commented 5 years ago

It seems to utilize the same method as the article says. I will be grateful If you can try it and see if it is feasible. I haven't got time to survey it.

carcabot commented 5 years ago

If you follow this you will get the correct signature.

def generate_signature(value):
    root_dir = os.path.dirname(os.path.abspath(__file__))

    response = muterun_js(root_dir + '/scripts/byted-acrawler.js', value)
    if response.exitcode == 0:
        # the command was successful, handle the standard output
        result = response.stdout.rstrip()
    else:
        # the command failed or the executable was not present, handle the standard error
        standard_err = response.stderr
        exit_code = response.exitcode
        print('Exit Status ' + exit_code + ': ' + standard_err)
        result = None
    return result

using this byted crawler

dacopan commented 5 years ago

If you follow this you will get the correct signature.

def generate_signature(value):
    root_dir = os.path.dirname(os.path.abspath(__file__))

    response = muterun_js(root_dir + '/scripts/byted-acrawler.js', value)
    if response.exitcode == 0:
        # the command was successful, handle the standard output
        result = response.stdout.rstrip()
    else:
        # the command failed or the executable was not present, handle the standard error
        standard_err = response.stderr
        exit_code = response.exitcode
        print('Exit Status ' + exit_code + ': ' + standard_err)
        result = None
    return result

using this byted crawler

this not work for me. any updates?

carl-jin commented 5 years ago

If you follow this you will get the correct signature.

def generate_signature(value):
    root_dir = os.path.dirname(os.path.abspath(__file__))

    response = muterun_js(root_dir + '/scripts/byted-acrawler.js', value)
    if response.exitcode == 0:
        # the command was successful, handle the standard output
        result = response.stdout.rstrip()
    else:
        # the command failed or the executable was not present, handle the standard error
        standard_err = response.stderr
        exit_code = response.exitcode
        print('Exit Status ' + exit_code + ': ' + standard_err)
        result = None
    return result

using this byted crawler

this not work for me. any updates?

the only i found is use google puppeteer, did you figure it out other way?

dacopan commented 5 years ago

If you follow this you will get the correct signature.

def generate_signature(value):
    root_dir = os.path.dirname(os.path.abspath(__file__))

    response = muterun_js(root_dir + '/scripts/byted-acrawler.js', value)
    if response.exitcode == 0:
        # the command was successful, handle the standard output
        result = response.stdout.rstrip()
    else:
        # the command failed or the executable was not present, handle the standard error
        standard_err = response.stderr
        exit_code = response.exitcode
        print('Exit Status ' + exit_code + ': ' + standard_err)
        result = None
    return result

using this byted crawler

this not work for me. any updates?

the only i found is use google puppeteer, did you figure it out other way?

Yes, use this awesome project by scrapping hub team https://github.com/scrapy-plugins/scrapy-splash

rubik commented 5 years ago

@elvisyjlin Did you solve this? I'm using the byted crawler tool, but I always get the same response:

{
    "content": "",
    "contentType": "text/html",
    "statusCode": 200
}

I am passing the same headers that I see in my browser, but still it does not work.

carcabot commented 5 years ago

The signature was changed again.

Take a look to my solution tiktok-signature

tojobac commented 5 years ago

@carcabot I tried the same solution (I decoded the js myself but ended with the same you use) But it doesn't works, I get http status 200 but no content :(

carcabot commented 5 years ago

@tojobac make sure that you use the same user-agent used to generate the signature with user agent used for request.

tojobac commented 5 years ago

@carcabot Thx, for replying I use the same user agent The url I use to generate the signature is like this https://www.tiktok.com/share/item/list?secUid=MS4wLjABAAAANHCJe1MPWoZluaFMZg9-uhBgkahSg59GtiFDQtGSICI6NKpsy2eDdVrCu0odeC0t&id=<user_id>&type=1&count=30&minCursor=0&maxCursor=0&shareUid= I don't know if the issue is related to the url (?) In case, I tested the example in the repo tiktok-signature with postman : same url, signature, referer(and all headers). Ends up with the same results.In both cases (my case and the example) I got: { "statusCode": 200, "contentType": "application/json", "content": "" }

carcabot commented 5 years ago

I'm not sure what url is that but i'm sure that it is not correct. Also it seems that there is a which should be removed. The correct url should be https://www.tiktok.com/share/item/list?secUid=&id=&type=5&count=30&minCursor=0&maxCursor=0&shareUid=

tojobac commented 5 years ago

you mean type should be 5?

tojobac commented 5 years ago

@carcabot I the solution you made in tiktok-signature project. When I do node chrome.js <url>, generate the same signature VIm6dAAgEBYZFjzZxqkSy1SJu2AAAlc always, regardless the url I pass to it(even I gave no url). I tried those 3 urls, and a 4th try without url.

Note that it generate the same signature because of this window.tac=<value>, when I remove this, it generate different signatures but none of the works

tojobac commented 5 years ago

@carcabot I did this generate it like you say. By the way, after testing, it seems that the "tac" is the key, I get the "tac" in the main page of my target user, and add "window.tac" in the my own script. And it works. Thx @carcabot I would not notice the utility of this famous "tac" without your code ;) Now I just have to take the "tac" in user's main page for each user :) Thx

carcabot commented 5 years ago

@tojobac are you sure that is not working with the default tac provided ? Because from my tests shows that it's works without being necessary to get a new tac.

tojobac commented 5 years ago

@carcabot What do you mean by default tac? I get the tac (in a Githubissues.

  • Githubissues is a development platform for aggregating issues.