LifeActor / ykdl

a video downloader focus on China mainland video sites.(一款专注于中国大陆视频网站的视频下载器。)
https://github.com/LifeActor/ykdl
Other
1.44k stars 285 forks source link

warning risky patch uploaded. any issue, please comment below. 警告:提交了有风险的代码,发现问题,请在下面留言。 #125

Closed zhangn1985 closed 7 years ago

zhangn1985 commented 7 years ago

https://github.com/zhangn1985/ykdl/commit/8a6e40f67162c762c3906b5c541e32b97cbddf01 detail:

diff --git a/ykdl/common.py b/ykdl/common.py
index 5cf476f..961e681 100644
--- a/ykdl/common.py
+++ b/ykdl/common.py
@@ -18,34 +18,40 @@ alias = {
         'douyutv' : 'douyu',
         'aixifan' : 'acfun'
 }
+exclude_list = ['com', 'net', 'org']
 def url_to_module(url):
-    video_host = match1(url, 'https?://([^/]+)/')
-    video_url = match1(url, 'https?://[^/]+(.*)')
-    assert video_host and video_url, 'invalid url: ' + url
-
-    if video_host.endswith('.com.cn'):
-        video_host = video_host[:-3]
-    domain = match1(video_host, '(\.[^.]+\.[^.]+)$') or video_host
-    assert domain, 'unsupported url: ' + url
-    k = match1(domain, '([^.]+)')
-    if k in alias.keys():
-        k = alias[k]
+    if not url.startswith("http"):
+        logger.warning("> url not starts with http(s) " + url)
+        logger.warning("> assume http connection!")
+        url = "http://" + url
+    video_host = url.split('/')[2]
+    host_list = video_host.split('.')
+    if host_list[-2] in exclude_list:
+        short_name = host_list[-3]
+    else:
+        short_name = host_list[-2]
+    logger.debug('video_host> ' + video_host)
+    logger.debug('short_name> ' + short_name)
+    if short_name in alias.keys():
+        short_name = alias[short_name]
     try:
-        m = import_module('.'.join(['ykdl','extractors', k]))
+        m = import_module('.'.join(['ykdl','extractors', short_name]))
         if hasattr(m, "get_extractor"):
             site = m.get_extractor(url)
         else:
             site = m.site
         return site, url
     except(ImportError):
+        logger.debug('> Try HTTP Redirection!')
         from ykdl.compact import HTTPConnection
         conn = HTTPConnection(video_host)
-        conn.request("HEAD", video_url, headers=fake_headers)
+        conn.request("HEAD", url, headers=fake_headers)
         res = conn.getresponse()
         location = res.getheader('location')
         if location is None:
+            logger.debug('> NO HTTP Redirection')
+            logger.debug('> Go Generalembed')
             return import_module('ykdl.extractors.generalembed').site, url
-        elif location != url:
-            return url_to_module(location)
         else:
-            raise ConnectionResetError(url)
+            logger.debug('New Location> ' + location)
+            return url_to_module(location)

due change the core algorithm for getting domain, it may lead to some issue, even i have done some test. 因为改变了获取domain的核心算法,尽管我做了一些测试,但不免还会出现一些问题。 if you meet some problem, please comment blow. 如果你发现了一些问题,请在下面留言。 this issue will keep opened before end of next week. 这个问题会一直开启直到下周末。

rosynirvana commented 7 years ago

支持现在的网站应该没什么问题,国内站点问题都不大 问题在于co.uk co.jp这种(ykdl不一定会考虑这些站)

感觉有两种方法能算“解决”这个问题 一种是维护一个长长的exclude_list把顶级域名都放进去 另一种是

supported_sites = ['youku', 'tudou', 'iqiyi', 'qq']
for site in supported_sites:
    if site in url:
        #do sth.