LifeActor / ykdl

a video downloader focus on China mainland video sites.
https://github.com/zhangn1985/ykdl
Other
1.43k stars 284 forks source link

[临时]斗鱼JS替换 #417

Closed airdge closed 4 years ago

airdge commented 5 years ago

之前的太过麻烦

oog=oog(otg+ttg+htg); var otc="test"; var cns=otc;tog=oog;
...
...
...
var re=oog[otc](tsg[wtf]);v[216]^=lk[0];!re && (function(){for(var i=0;i<v.length;i++){v[i]=i;}})();
var re=oog[cns](tsg[wtf]);v[238]=(v[238]<<(lk[0]%16))|(v[238]>>>(32-(lk[0]%16)));!re && (function(){ wea=400;})();
var re=tog[cns](tsg[wtf]);re && (function(){v[275]^=lk[1];})();
var re=tog[otc](sck[dtf]);v[298]=(v[298]<<(lk[0]%16))|(v[298]>>>(32-(lk[0]%16)));!re && (function(){for(var i=0;i<v.length;i++){v[i]=i;}})();

函数通过获取oog['test'](obj)真假来给v赋值 所以只要强行将oog['test'](obj)返回true就行了

                      //eval(Ee); 
                      var newString = "'" + Ee + "';";
                      var evalString = eval(newString);
                      evalString = evalString.replace(/(.*oog\(.*)/, "$1oog['test'] = function(a) { return 1; }");
                      eval(evalString);

图片

========================================================================

========================================================================

========================================================================

大概流程就是通过匹配 eval(encrypt)的encrypt 通过

var  enc ="'" + encrypt + "';";
var  decrypt = eval(enc);
eval(decrypt);

来获取还原代码decrypt 再通过decrypt代码分析斗鱼js所要执行的情况,做进一步的替换处理和还原 图片

不是每次都能使用,现在并没有完全解决问题 现只遇到三种形式,只对其中两种情况作了处理,还有一种情况相对麻烦 因homeH5Enc返回的数据不是每次都相同,刷新几次或者等几分钟应该能解析 先做临时使用,另外的有时间再分析 图片

不影响原有的py代码,只在解析不到ub98484234时候运行 但必须给所有的房间都添加DOM对象,这个不影响正常房间的解析

        js_ctx = JSEngine(js_md5) 

        # 此处必须给js添加DOM,不然ub98484234会因缺少DOM出现报错,不影响其他房间解析
        dom = "let window = {},document  = {};" 
        js_ctx.eval(dom)

        js_ctx.eval(js_enc)
        did = uuid.uuid4().hex
        tt = str(int(time.time())) 
        ub98484234 = js_ctx.call('ub98484234', self.vid, did, tt) 
        if not ub98484234:
            # 对function ub98484234的eval字段进行匹配
            func = match1(js_enc,'function ub98484234(.*)')
            workflow= match1(func,'eval\((\w+)\)') 
            # 替换字符串
            replaceString = ''' 
                    // eval(%(workflow)s); 
                    // let workfolw = %(workflow)s ; 
                    var newString = "'" + %(workflow)s + "';";
                    // 获取还原过的代码
                    var evalString = eval(newString);
                    // DOM检测
                    if (/(\w+)=(window|document)/g.exec(evalString)) {
                        var execWin = /(\w+)=window/g.exec(evalString);
                        var execDoc = /(\w+)=document/g.exec(evalString);
                        window.isWin = execWin ? execWin[1] : '';
                        document.isDoc = execDoc ? execDoc[1] : '';
                    } 
                    // 如果还原代码中还有eval()函数
                    if (/eval\(/g.exec(evalString)) {
                        var encString=evalString.replace(/eval\(\w+\)/g,'');
                        eval(encString);
                        if (/function/g.exec(%(workflow)s) && (%(workflow)s.indexOf(window.isWin) > 0 || %(workflow)s.indexOf(document.isDoc) > 0)) {
                            var reString = /var (\w+)=/g.exec(%(workflow)s);
                            reString && (evalString = %(workflow)s.replace(reString[0], reString[0] + "!"));
                            // 如果还原代码中还包含了混淆代码段
                            let reEval = /\(function\(\)\{(.*)\}\)\(\)/g.exec(encString);
                            if (reEval && />>>|<<<|^/g.exec(reEval[1])) { 
                                  var xxxxxxxxxxx = function() {
                                      eval(reEval[1]);
                                      var funcString = /var (\w+)=/g.exec(%(workflow)s);
                                      funcString && (evalString = %(workflow)s.replace(reString[0], reString[0] + "!"));
                                      return evalString;
                                  }();
                              }
                        }
                    } else {
                        // 如果还原代码中含有function函数,以及包含DOM,则对function test逻辑取反(a = b -> a = !b)
                        if (/function/g.exec(evalString) && (evalString.indexOf(window.isWin) > 0 || evalString.indexOf(document.isDoc) > 0)) {
                            var reString = /var (\w+)=/g.exec(evalString);
                            reString && (evalString = evalString.replace(reString[0], reString[0] + "!"));
                        }
                    }
                    eval(evalString);
            '''% {'workflow': workflow}

            js_Dom = js_enc.replace('eval('+workflow+');', replaceString)
            js_ctx.eval(js_Dom)  
            ub98484234 = js_ctx.call('ub98484234', self.vid, did, tt) 
        self.logger.debug('ub98484234: ' + ub98484234)

图片 douyu/live.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from ykdl.util.html import get_content, add_header
from ykdl.util.match import match1, matchall
from ykdl.util.jsengine import JSEngine, javascript_is_supported
from ykdl.extractor import VideoExtractor
from ykdl.videoinfo import VideoInfo
from ykdl.compact import urlencode

import time
import json
import uuid

douyu_match_pattern = [ 'class="hroom_id" value="([^"]+)',
                        'data-room_id="([^"]+)'
                      ]

class Douyutv(VideoExtractor):
    name = u'斗鱼直播 (DouyuTV)'

    stream_ids = ['BD10M', 'BD8M', 'BD4M', 'BD', 'TD', 'HD', 'SD']
    profile_2_id = {
        u'蓝光10M': 'BD10M',
        u'蓝光8M': 'BD8M',
        u'蓝光4M': 'BD4M',
        u'蓝光': 'BD',
        u'超清': 'TD',
        u'高清': 'HD',
        u'流畅': 'SD'
     }

    def prepare(self):
        assert javascript_is_supported, "No JS Interpreter found, can't parse douyu live!"

        info = VideoInfo(self.name, True)
        add_header("Referer", 'https://www.douyu.com')

        html = get_content(self.url)
        self.vid = match1(html, 'room_id\s*=\s*(\d+)',
                                '"room_id.?":(\d+)',
                                'data-onlineid=(\d+)')
        title = match1(html, 'Title-headlineH2">([^<]+)<')
        artist = match1(html, 'Title-anchorName" title="([^"]+)"')

        if not title or not artist:
            html = get_content('https://open.douyucdn.cn/api/RoomApi/room/' + self.vid)
            room_data = json.loads(html)
            if room_data['error'] == 0:
                room_data = room_data['data']
                title = room_data['room_name']
                artist = room_data['owner_name']

        info.title = u'{} - {}'.format(title, artist)
        info.artist = artist

        html_h5enc = get_content('https://www.douyu.com/swf_api/homeH5Enc?rids=' + self.vid)
        data = json.loads(html_h5enc)
        assert data['error'] == 0, data['msg']
        js_enc = data['data']['room' + self.vid]

        try:
            # try load local .js file first
            # from https://cdnjs.com/libraries/crypto-js
            from pkgutil import get_data
            js_md5 = get_data(__name__, 'crypto-js-md5.min.js')
            if isinstance(js_md5, bytes):
                js_md5 = js_md5.decode()
        except IOError:
            js_md5 = get_content('https://cdnjs.cloudflare.com/ajax/libs/crypto-js/3.1.9-1/crypto-js.min.js')

        js_ctx = JSEngine(js_md5) 

        # 此处必须给js添加DOM,不然ub98484234会因缺少DOM出现报错,不影响其他房间解析
        dom = "let window = {},document  = {};" 
        js_ctx.eval(dom)

        js_ctx.eval(js_enc)
        did = uuid.uuid4().hex
        tt = str(int(time.time())) 
        ub98484234 = js_ctx.call('ub98484234', self.vid, did, tt) 
        if not ub98484234:
            # 对function ub98484234的eval字段进行匹配
            func = match1(js_enc,'function ub98484234(.*)')
            workflow= match1(func,'eval\((\w+)\)') 
            # 替换字符串
            replaceString = ''' 
                    // eval(%(workflow)s); 
                    // let workfolw = %(workflow)s ; 
                    var newString = "'" + %(workflow)s + "';";
                    // 获取还原过的代码
                    var evalString = eval(newString);
                    // DOM检测
                    if (/(\w+)=(window|document)/g.exec(evalString)) {
                        var execWin = /(\w+)=window/g.exec(evalString);
                        var execDoc = /(\w+)=document/g.exec(evalString);
                        window.isWin = execWin ? execWin[1] : '';
                        document.isDoc = execDoc ? execDoc[1] : '';
                    } 
                    // 如果还原代码中还有eval()函数
                    if (/eval\(/g.exec(evalString)) {
                        var encString=evalString.replace(/eval\(\w+\)/g,'');
                        eval(encString);
                        if (/function/g.exec(%(workflow)s) && (%(workflow)s.indexOf(window.isWin) > 0 || %(workflow)s.indexOf(document.isDoc) > 0)) {
                            var reString = /var (\w+)=/g.exec(%(workflow)s);
                            reString && (evalString = %(workflow)s.replace(reString[0], reString[0] + "!"));
                            // 如果还原代码中还包含了混淆代码段
                            let reEval = /\(function\(\)\{(.*)\}\)\(\)/g.exec(encString);
                            if (reEval && />>>|<<<|^/g.exec(reEval[1])) { 
                                  var xxxxxxxxxxx = function() {
                                      eval(reEval[1]);
                                      var funcString = /var (\w+)=/g.exec(%(workflow)s);
                                      funcString && (evalString = %(workflow)s.replace(reString[0], reString[0] + "!"));
                                      return evalString;
                                  }();
                              }
                        }
                    } else {
                        // 如果还原代码中含有function函数,以及包含DOM,则对function test逻辑取反(a = b -> a = !b)
                        if (/function/g.exec(evalString) && (evalString.indexOf(window.isWin) > 0 || evalString.indexOf(document.isDoc) > 0)) {
                            var reString = /var (\w+)=/g.exec(evalString);
                            reString && (evalString = evalString.replace(reString[0], reString[0] + "!"));
                        }
                    }
                    eval(evalString);
            '''% {'workflow': workflow}

            js_Dom = js_enc.replace('eval('+workflow+');', replaceString)
            js_ctx.eval(js_Dom)  
            ub98484234 = js_ctx.call('ub98484234', self.vid, did, tt) 
        self.logger.debug('ub98484234: ' + ub98484234)
        params = {
            'v': match1(ub98484234, 'v=(\d+)'),
            'did': did,
            'tt': tt,
            'sign': match1(ub98484234, 'sign=(\w{32})'),
            'cdn': '',
            'iar': 0,
            'ive': 0
        }

        def get_live_info(rate=0):
            params['rate'] = rate
            data = urlencode(params)
            if not isinstance(data, bytes):
                data = data.encode()
            html_content = get_content('https://www.douyu.com/lapi/live/getH5Play/{}'.format(self.vid), data=data)
            self.logger.debug(html_content)

            live_data = json.loads(html_content)
            if live_data['error']:
                return live_data['msg']

            live_data = live_data["data"]
            real_url = '{}/{}'.format(live_data['rtmp_url'], live_data['rtmp_live'])
            rate_2_profile = dict((rate['rate'], rate['name']) for rate in live_data['multirates'])
            video_profile = rate_2_profile[live_data['rate']]
            stream = self.profile_2_id[video_profile]
            if stream in info.streams:
                return
            info.stream_types.append(stream)
            info.streams[stream] = {
                'container': 'flv',
                'video_profile': video_profile,
                'src' : [real_url],
                'size': float('inf')
            }

            error_msges = []
            if rate == 0:
                rate_2_profile.pop(0, None)
                rate_2_profile.pop(live_data['rate'], None)
                for rate in rate_2_profile:
                    error_msg = get_live_info(rate)
                    if error_msg:
                        error_msges.append(error_msg)
            if error_msges:
                return ', '.join(error_msges)

        error_msg = get_live_info()
        assert len(info.stream_types), error_msg
        info.stream_types = sorted(info.stream_types, key=self.stream_ids.index)
        return info

    def prepare_list(self):

        html = get_content(self.url)
        return matchall(html, douyu_match_pattern)

site = Douyutv()
SeaHOH commented 5 years ago

不确定性太大,除非有人专门维护才会 merge。 其实直接替换成以下 js 就能工作。

+++if (Ee.indexOf("!re &&") == -1) {
+++    var Ee = Ee.replace("re &&", "!re &&");
+++} else {
+++    var Ee = Ee.replace("!re &&", "re &&");
+++}
eval(Ee);
yoyosnart commented 5 years ago

不确定性太大,除非有人专门维护才会 merge。 其实直接替换成以下 js 就能工作。

+++if (Ee.indexOf("!re &&") == -1) {
+++    var Ee = Ee.replace("re &&", "!re &&");
+++} else {
+++    var Ee = Ee.replace("!re &&", "re &&");
+++}
eval(Ee);

請問這個要加在livepy的哪裡才能運作

SeaHOH commented 5 years ago

@airdge 给出的代码就能正常工作,我给出的是极简化版本,容错性非常差。

DayChan commented 5 years ago

Thanks very much!

airdge commented 5 years ago

添加eval(a)(b,c,d)判断 因还原形式基本都差不多,故没做进一步判断

            # 替换字符串
            replaceString = ''' 
                    // eval(%(workflow)s); 
                    // let workfolw = %(workflow)s ; 
                    var endEval = '';
                    var newString = "'" + %(workflow)s + "';";
                    // 获取还原过的代码
                    var evalString = eval(newString);
                    // DOM检测
                    if (/(\w+)=(window|document)/g.exec(evalString)) {
                        var execWin = /(\w+)=window/g.exec(evalString);
                        var execDoc = /(\w+)=document/g.exec(evalString);
                        window.isWin = execWin ? execWin[1] : '';
                        document.isDoc = execDoc ? execDoc[1] : '';
                    } 
                    // 如果还原代码中含有eval自执行  eval(a)(b,c,d)
                    if (/eval\(\w+\)\(/g.exec(evalString)) {
                          endEval = 1;
                    }
                    // 如果还原代码中还有eval();
                    else if (/eval\(/g.exec(evalString)) {
                        var encString=evalString.replace(/eval\(\w+\)/g,'');
                        eval(encString);
                        if (/function/g.exec(%(workflow)s) && (%(workflow)s.indexOf(window.isWin) > 0 || %(workflow)s.indexOf(document.isDoc) > 0)) {
                            var reString = /var (\w+)=/g.exec(%(workflow)s);
                            reString && (evalString = %(workflow)s.replace(reString[0], reString[0] + "!"));
                            // 如果还原代码中还包含了混淆代码段
                            let reEval = /\(function\(\)\{(.*)\}\)\(\)/g.exec(encString);
                            if (reEval && />>>|<<<|^/g.exec(reEval[1])) { 
                                  var xxxxxxxxxxx = function() {
                                      eval(reEval[1]);
                                      var funcString = /var (\w+)=/g.exec(%(workflow)s);
                                      funcString && (evalString = %(workflow)s.replace(reString[0], reString[0] + "!"));
                                      return evalString;
                                  }();
                              }
                        }
                        // 如果还原代码不包含function
                        else if (/>>>|<<<|^/g.exec(%(workflow)s)) { 
                             endEval = 1;
                         }
                    } else {
                        // 如果还原代码中含有function函数,以及包含DOM,则对function test逻辑取反(a = b -> a = !b)
                        if (/function/g.exec(evalString) && (evalString.indexOf(window.isWin) > 0 || evalString.indexOf(document.isDoc) > 0)) {
                            var reString = /var (\w+)=/g.exec(evalString);
                            reString && (evalString = evalString.replace(reString[0], reString[0] + "!"));
                        }
                    }
                    endEval ? eval(%(workflow)s) : eval(evalString);
            '''% {'workflow': workflow}
SeaHOH commented 5 years ago

虽然写了是『临时』,还是问一下,请问你有兴趣持续维护吗?仅单指这种 javascript patch。

SeaHOH commented 5 years ago

斗鱼所有直播都改成这种混淆了,看来不改不行。。。

bigmangos commented 5 years ago

全部需要DOM了,斗鱼的网页端很是难搞,还经常变

incharges commented 5 years ago

修改 @airdge 提供的代码后解决了原本出现的异常 'document' is not defined 然而又出现了 'window' is not defined 异常

SeaHOH commented 5 years ago

我重写了一个看起来不那么复杂的,以 /;(!?)(\w+ && \(function\()/g 来判断替换反转,并添加了调试代码。

如果有更好的方法,请继续提出来,非常感谢!

incharges commented 5 years ago

我重写了一个看起来不那么复杂的,以 /;(!?)(\w+ && \(function\()/g 来判断替换反转,并添加了调试代码。

如果有更好的方法,请继续提出来,非常感谢!

经测试基本完美,可以正常录制 只是偶尔会出现以下讯息 Unexpected identifier after numeric literal 然而还是非常感谢几位大老的协助

incharges commented 5 years ago

再次感谢 @SeaHOH 修正新的程式码 经测试不再出现 Unexpected identifier after numeric literal 讯息 这次是真的完美了

SeaHOH commented 5 years ago

使用的方法就是 @airdge 提供的,只是重写针对的点不同,可以稍微简化逻辑处理。 而这个 bug 也是我使用随机名称才引入的,并不是混淆的结果。

xiatiantiantian commented 5 years ago

YKDL https://www.douyu.com/288016 https://www.douyu.com/998 https://www.douyu.com/987

1.随意选几个正在直播的直播间,只能录制第一个,其他的都不能。(不知道怎么弄,麻烦指点下,谢谢)

  1. 偶尔会出现断开(但是手机打开直播间看还在直播的)
    989

3.不知道能不能监听直播间,以防断流后,自动再次录制。

bigmangos commented 5 years ago
PS F:\> ykdl https://www.douyu.com/998
site:                斗鱼直播 (DouyuTV)
title:               重播丨15日总决eStar vs RNG - 王者荣耀官方赛事
artist:              王者荣耀官方赛事
streams:
    - format:        BD10M
      container:     flv
      video-profile: 蓝光10M
    # download-with: ykdl --format=BD10M [URL]
Now downloading: 重播丨15日总决eStar vs RNG - 王者荣耀官方赛事_BD10M_2019-06-11T19-09-41.871858.flv

测试正常

SeaHOH commented 5 years ago

1.随意选几个正在直播的直播间,只能录制第一个,其他的都不能。

测试直接播放和录制都正常。

2.偶尔会出现断开(但是手机打开直播间看还在直播的) 3.不知道能不能监听直播间,以防断流后,自动再次录制。

不能。

xiatiantiantian commented 5 years ago

1.随意选几个正在直播的直播间,只能录制第一个,其他的都不能。

测试直接播放和录制都正常。

2.偶尔会出现断开(但是手机打开直播间看还在直播的) 3.不知道能不能监听直播间,以防断流后,自动再次录制。

不能。

I:\新建文件夹 (4)\ykdl\新建文件夹> YKDL https://www.douyu.com/288016 https://www.douyu.com/998 https://www.douyu.com/987 site: 斗鱼直播 (DouyuTV) title: LPL夏季赛FPXvs RW - 英雄联盟赛事 artist: 英雄联盟赛事 streams:

还是不可以同时录制几个直播间啊,不知道是不是我操作错误,这3个直播间都在直播。

bigmangos commented 5 years ago

开三个命令行窗口

SeaHOH commented 5 years ago

开三个命令行窗口

是这样的,无法同时播放或录制单个命令,需要分别运行。

a67878813 commented 5 years ago

ubuntu16.04此时,下载的zip包,对任意房间使用后显示如下错误

Gjs-Message: JS WARNING: [/tmp/execjsze1d78o3.js 5]: variable n redeclares argument Gjs-Message: JS WARNING: [/tmp/execjsze1d78o3.js 10]: assignment to undeclared variable result Gjs-Message: JS WARNING: [/tmp/execjsrry1siij.js 6]: variable n redeclares argument Gjs-Message: JS WARNING: [/tmp/execjsrry1siij.js 11]: assignment to undeclared variable hqznH926 Gjs-Message: JS WARNING: [/tmp/execjsrry1siij.js 12]: "window" is read-only Gjs-Message: JS WARNING: [/tmp/execjsrry1siij.js 13]: assignment to undeclared variable document Gjs-Message: JS WARNING: [/tmp/execjsrry1siij.js 10]: assignment to undeclared variable result Gjs-Message: JS WARNING: [/tmp/execjsu02q3fly.js 6]: variable n redeclares argument Gjs-Message: JS WARNING: [/tmp/execjsu02q3fly.js 11]: assignment to undeclared variable hqznH926 Gjs-Message: JS WARNING: [/tmp/execjsu02q3fly.js 12]: "window" is read-only Gjs-Message: JS WARNING: [/tmp/execjsu02q3fly.js 13]: assignment to undeclared variable document Traceback (most recent call last): File "/usr/local/bin/ykdl", line 9, in load_entry_point('ykdl==1.6.3', 'console_scripts', 'ykdl')() File "/usr/local/lib/python3.5/dist-packages/ykdl-1.6.3-py3.5.egg/cykdl/main.py", line 178, in main File "/usr/local/lib/python3.5/dist-packages/ykdl-1.6.3-py3.5.egg/ykdl/extractor.py", line 21, in parser File "/usr/local/lib/python3.5/dist-packages/ykdl-1.6.3-py3.5.egg/ykdl/extractors/douyu/live.py", line 126, in prepare File "/usr/local/lib/python3.5/dist-packages/ykdl-1.6.3-py3.5.egg/ykdl/util/jsengine.py", line 135, in eval File "/usr/local/lib/python3.5/dist-packages/ykdl-1.6.3-py3.5.egg/ykdl/util/jsengine.py", line 171, in _eval File "/usr/local/lib/python3.5/dist-packages/ykdl-1.6.3-py3.5.egg/ykdl/util/jsengine.py", line 187, in _exec ykdl.util.jsengine.ProgramError: SyntaxError: illegal character

a67878813 commented 5 years ago

@TaoziDB Which character should i replace?

TaoziDB commented 5 years ago

only possible illegal character might be ` (back-tick) in js_patch due to limited support of some js engine for template string, in this case, replace all ` (back-tick) with " (double-quote) should work too

a67878813 commented 5 years ago

I replaced back-tick with " in js_patch and it failed.

ykdl RETURN ykdl.util.jsengine.ProgramError: SyntaxError: unterminated string literal

seems there are too many " to accomplish ''' ''' str's grammar. What should I do?

TaoziDB commented 5 years ago

only 2 `, don't confuse with single quote

a67878813 commented 5 years ago

Image 2 it just did not work.

SeaHOH commented 5 years ago

"不支持换行,删掉换行试试。

a67878813 commented 5 years ago

@SeaHOH @TaoziDB 谢谢。测试成功 ubuntu16.04 Image 3

SeaHOH commented 5 years ago

再次简化,只修改 DOM 检测函数。

zxdong262 commented 5 years ago

试了下,似乎很短时间之后就会中断? 更新,测试4m以及以上会断,超清坚挺

TaoziDB commented 5 years ago

三个线路:主线路 ws-h5,备用线路5 tct-h5,备用线路6 ali-h5 只有用备用线路5 tct-h5的时候不会断,其它两个线路过几秒就会断,怀疑流文件格式不标准

SeaHOH commented 5 years ago

关于线路问题,这只和用户网络相关,即使同一线路,不同用户播放效果也会有不同表现。

本项目不会强制选择,如有需要请自行修改源码中的cdn参数。

guest6379 commented 5 years ago

@airdge @SeaHOH 天才,神一样的存在,感谢你们的commit

xiatiantiantian commented 4 years ago

@SeaHOH

ROOMID: 7020364 报错 kinako鏌効 鐨勭洿鎾棿 - kinako鏌効_BD4M_2020-03-13T00-16-51.395533.flv: Invalid argument

xiatiantiantian commented 4 years ago

douyu.com/g_yz (随便选一个手机播的,2人正在PK的,竖屏的都是手机播的)

只要是用手机播的,2人PK的话,只能下载到一个人的画面。(电脑播的没有这个问题)

遇到2人PK,它会刷新一个新的视频源(2人PK的画面),原始源降码率且一直有效(但是一直是一个人画面)。 PK结束后这个视频源自动失效。