blackjack4494 / youtube-dlc

Command-line program to download various media from YouTube.com and other sites
https://blackjack4494.github.io/youtube-dlc/
The Unlicense
1.21k stars 13 forks source link

Chinese video websites? #79

Open faissaloo opened 4 years ago

faissaloo commented 4 years ago

Checklist

Example URLs

Description

So there are a bunch of these Chinese video streaming sites, the only way for me to download videos from these sites is to use https://weibomiaopai.com/ atm, while some sites will give you MP4s straight up some of them give you m3u8 playlist files linking to a bunch of TS files which would then need to be spliced together using FFMPEG or something. The connection is also often extremely unstable when dealing with sites that only provide TS files so you'll need to run the downloads in parallel for those or it will take forever.

Notaghost9997 commented 4 years ago

you can easily see which url is being played in network tab of developers tool okay to download videos from this site try removing range from the url: this is https://akcdnoversea.inter.71edge.com/videos/v1/20200907/e9/fc/6baf7f350821e403288add0048547a78.f4v?key=02a988dac39af40f4ebcdc24c31c1503e&dis_k=8184abe4185a30d37ec0a088b0e0a255&dis_t=1600016670&dis_dz=OVERSEA-PK&dis_st=103&src=iqiyi.com&dis_hit=0&uuid=273b16a6-5f5e511e-1a9&client=&su=49ad731d5d279b85213bf1c80d7e5482&mi=tv_1851097964520300_1851097964520300_492d24920f3f9bf13ed51c3769e98a59&pv=0.1&qyid=49ad731d5d279b85213bf1c80d7e5482&qd_vipdyn=0&z=&e=&ve=&ct=2&bt=&qd_aid=1851097964520300&qd_tvid=1851097964520300&qd_stert=0&qd_p=273b16a6&qd_src=01010031010000000000&qd_tm=1600016644671&qd_k=c389d056554ab45660d534ed00b0217c&qd_index=1&qd_vip=0&qd_uid=&qd_vipres=0&tn=0.397130052099691&range=7021446-7557305

so from this url just remove &range=7021446-7557305 it should start downloading as default it should work on other videos as well

blackjack4494 commented 4 years ago

If there isn't an api we can try using simple regex to find the url(s) and then process those. I haven't checked if there are extractors for these sites already. However you can try to collect some more (different) urls in a list so that we can create some regex pattern (used to check if an url is valid and also to extract possible ids). The most basic approach is always looking for any links in the source directing to the video or some manifest which contains the videos. If you want you can give it a try. Let me know if you need some basic template to start with.

faissaloo commented 4 years ago

I don't think you'll be able to use a simple regex, it looks like they've obfuscated the information you need. I had a brief look and in the sources you can see: "tvId":1851097964520300,"vid":"492d24920f3f9bf13ed51c3769e98a59" Which can be found in the download URL:

https://akcdnoversea.inter.71edge.com/videos/v1/20200907/e9/fc/6baf7f350821e403288add0048547a78.f4v?key=08e90eb21215157b2e33d3752dbbc07bd&dis_k=abd85840ecc4476b94aae9a2cd266b40&dis_t=1600092824&dis_dz=OVERSEA-GB&dis_st=103&src=iqiyi.com&dis_hit=0&uuid=521a251d-5f5f7a98-273&client=&su=d7294d2469d23e23c66677a94da2ee91&retry=1&qd_vipdyn=0&mi=tv_1851097964520300_1851097964520300_492d24920f3f9bf13ed51c3769e98a59&qyid=d7294d2469d23e23c66677a94da2ee91&qd_uid=&pv=0.1&qd_tm=1600092766861&e=&ve=&ct=2&bt=&qd_aid=1851097964520300&qd_tvid=1851097964520300&qd_stert=0&qd_p=521a251d&qd_src=01010031010000000000&qd_k=9c9e6f31d73d1500e72f72764d8a8059&qd_index=1&qd_vip=0&z=&qd_vipres=0&tn=0.9496795250511764

However I'm still not sure where they're getting the 6baf7f350821e403288add0048547a78 from, since I can't find it in the sources, it's 16 bytes so I'm guessing it's an MD5 of something?
To add to that it also seems like the download URLs expire since the one @moonmuaaz000 posted no longer works.

blackjack4494 commented 4 years ago

The video that shows is an advertisement. You need to login (paid subscription needed?) or so. However for the first site there is a dash api containing all information. If you debug the js module you see how authkey is generated. It uses md5 twice and some parameters like tvid (which is exposed in html). I did some similar reverse engineering in soundcloud extractor for the login (which was not an easy ride)

blackjack4494 commented 4 years ago

the qq site is definitely more complex. There is a script called htmlframe (triggered by proxyhttp?). In that script basically all requests are set together.

Notaghost9997 commented 4 years ago

it has expire url thing in it every url will expire after some time just go to network tabs developers tab play the video and wait for bunch of url to load just pick any and remove range also there are bunch of api's too i think

this api has direct f4v download link for the video but idk if itll expire soon tho https://pcw-data.video.iqiyi.com/videos/v1/20200907/e9/fc/6baf7f350821e403288add0048547a78.f4v?qd_tvid=1851097964520300&qd_vipres=0&qd_index=1&qd_aid=1851097964520300&qd_stert=0&qd_scc=ea698942a210dc0ba498f577264302c7&qd_sc=d2b07433228424e1d89d676caf6091f7&ve=&qd_p=273b0cdb&qd_k=63976291c4784d0861784461e19ef3e5&qd_src=01010031010000000000&qd_vipdyn=0&qd_uid=&qd_tm=1600104515914&qd_vip=0&tn=0.40946958470327255&su=49ad731d5d279b85213bf1c80d7e5482&pv=0.1&qyid=49ad731d5d279b85213bf1c80d7e5482&client=&z=&mi=tv_1851097964520300_1851097964520300_492d24920f3f9bf13ed51c3769e98a59&bt=&ct=2&e=

which is made from this dash api file

https://cache.video.iqiyi.com/dash?tvid=1851097964520300&bid=300&vid=492d24920f3f9bf13ed51c3769e98a59&src=01010031010000000000&vt=0&rs=1&uid=&ori=pcw&ps=0&k_uid=49ad731d5d279b85213bf1c80d7e5482&pt=0&d=0&s=&lid=&cf=&ct=&authKey=7c23ae024111054bdb93fbdf0a3c9c46&k_tag=1&ost=0&ppt=0&dfp=a0b108b4928eb44334880ea4d0c2246da773441e34dfb3a6b610bd2165bb04a537&locale=zh_cn&prio=%7B%22ff%22%3A%22f4v%22%2C%22code%22%3A2%7D&pck=&k_err_retries=0&up=&qd_v=2&tm=1600104512145&qdy=a&qds=0&k_ft1=706436220846084&k_ft4=1161084347621380&k_ft5=1&bop=%7B%22version%22%3A%2210.0%22%2C%22dfp%22%3A%22a0b108b4928eb44334880ea4d0c2246da773441e34dfb3a6b610bd2165bb04a537%22%7D&ut=0&vf=63976291c4784d0861784461e19ef3e5

it requires tons of things to access

if you go to html source code of the video : https://www.iqiyi.com/v_j4moctuoe8.html param['parentId'] = 'flashbox'; param['albumid'] = "0"; param['tvid'] = "1851097964520300"; param['vid'] = "492d24920f3f9bf13ed51c3769e98a59"; param['albumId'] = "0"; param['channelID'] = "30"; param['isMember'] = "false"; param['isNew'] = true; param['qiyiProduced'] = '0'; param['exclusive'] = '0'; param['origin'] = 'fla now if you ll go to this url this url uses tvid & uid from the above

https://nl-rcd.iqiyi.com/apis/urc/getvplay?tvId=1851097964520300&agent_type=1&ckuid=49ad731d5d279b85213bf1c80d7e5482

you'll see something like this now you will get uid of the video

"{\"code\":\"A00000\",\"data\":{\"uid\":\"49ad731d5d279b85213bf1c80d7e5482\",\"tvId\":1851097964520300,\"videoPlayTime\":26}}"

i dont have much time but goodluck with finding other stuff

faissaloo commented 4 years ago

The video that shows is an advertisement. You need to login (paid subscription needed?) or so.

You need to click the skip ad button which counts down in the bottom right @blackjack4494

Screenie_6

webber-g commented 4 years ago

There is this other project called "ykdl" that deals exclusively with Chinese video sites. Maybe you can check it out.

https://github.com/zhangn1985/ykdl

My experience with it was not all that good the last time I tried it a couple months ago. It seemed to generate the right links and start downloading all the parts of videos, but then would timeout or somehow just not download them completely.

blackjack4494 commented 4 years ago

There is this other project called "ykdl" that deals exclusively with Chinese video sites. Maybe you can check it out.

https://github.com/zhangn1985/ykdl

My experience with it was not all that good the last time I tried it a couple months ago. It seemed to generate the right links and start downloading all the parts of videos, but then would timeout or somehow just not download them completely.

yup I found you-get awhile ago (the prject you linked is a fork of you-get). However I found a solution to iqiyi already. it's just a mess right now :D But sure it's worth to check their routines how they got the videos :) Just have to be sure what license they use. The main project uses go I believe?