准备提交一个pr,添加蜜柑计划做为数据来源

trim21 commented 7 years ago

昨天在添加恋爱禁止的世界的时候,实际抓回来的是捏造陷阱NTR. 最主要的是到现在也没有new game.bangumi.moe那边的数据准确性好像有点低.似乎是自动识别加tag的

蜜柑计划 http://mikanani.me/的数据准确度比较高,本身就做好了番剧和字幕组的区分. 准备提交个pr,加一个数据来源,也从那边抓数据过来.

没有new game看我要死了

RicterZ commented 7 years ago

new game会有的，不过需要等等…

恋爱禁止世界回来的NTR怕不是字幕组发布的时候出现的锅，上游数据问题我也无法（ Trim21 notifications@github.com于2017年7月6日周四下午10:02写道：

昨天在添加恋爱禁止的世界的时候,实际抓回来的是捏造陷阱NTR. 最主要的是到现在也没有new game.bangumi.moe那边的数据准确性好像有点低.似乎是自动识别加tag的

蜜柑计划 http://mikanani.me/ http://mikanani.me的数据准确度比较高,本身就做好了番剧和字幕组的区分. 准备提交个pr,加一个数据来源,也从那边抓数据过来.

没有new game看我要死了

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RicterZ/BGmi/issues/74, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCbx5zxBxoRHJHB8JbR-JX5H5B8KBf0ks5sLOj_gaJpZM4OPrgJ .

trim21 commented 7 years ago

我昨天本来想提issue的，然后发现其实是上游数据的问题。所以准备自己动手添加数据源。这个issue主要是想问一下你是否介意，以及在完成之后是愿意合并。以及如果愿意的话有没有什么实现方法上介意的地方，比如介意添加依赖之类的（

RicterZ commented 7 years ago

暂时没有换数据源的想法，其实就是改fetch.py，其他都基本不动…

我想想要不要加一个接口可以自己实现解析数据源这样，可以比较容易的扩展切换。

上游数据问题我也没啥解决办法，只能求各位字幕组大爷别出错，然后默默加一个 filter…

如果你要想添加的话可以起一个 fetch_xxx.py，默认不启用，可以手动切换（mv 到 fetch.py），接口遵循好。

感觉坑也是多，可能数据结构会改变所以数据库也有相应变化。

Trim21 notifications@github.com于2017年7月6日周四下午10:20写道：

我昨天本来想提issue的，然后发现其实是上游数据的问题。所以准备自己动手添加数据源。这个issue主要是想问一下你是否介意，以及在完成之后是愿意合并。以及如果愿意的话有没有什么实现方法上介意的地方，比如介意添加依赖之类的（

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RicterZ/BGmi/issues/74#issuecomment-313410315, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCbxxRu9eLAP03Zq2BcZ4wlwEJsfpiTks5sLO03gaJpZM4OPrgJ .

w3eee commented 7 years ago

搭车提个疑问订阅的是怎么把番组和对应的种子文件对应起来的仅仅是名称的比对么? 但是有一些种子的命名不规范怎么办

RicterZ commented 7 years ago

有 parser 的 Wee notifications@github.com于2017年7月8日周六上午11:56写道：

搭车提个疑问订阅的是怎么把番组和对应的种子文件对应起来的仅仅是名称的比对么? 但是有一些种子的命名不规范怎么办

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RicterZ/BGmi/issues/74#issuecomment-313831499, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCbx_RqSOQIEOm9RQWuJmUZSq_yn0iuks5sLv3TgaJpZM4OPrgJ .

trim21 commented 7 years ago

重写了fetch.py 把从数据源获取数据抽象成了三个方法

class BangumiMoe(BaseWebsite):
    cover_url=''
    def search_by_keyword(self, keyword, count):
        return []

    def fetch_bangumi_calendar_and_subtitle_group(self):
        return [], []

    def fetch_episode_of_bangumi(self, bangumi_id, subtitle_list=None, max_page=MAX_PAGE):
        return []

如果要修改数据源的话重写这三个方法就可以了.. 使用过程中不能更换数据源.

改动有些大,感觉好像跟script.py的作用部分重叠了....

RicterZ commented 7 years ago

emm，bgmi script 我打算添加一个自定义 model 的功能，还在构思。目前的想法是你的蜜柑可以作为一个 api，script 可以传入参数调用就能获取结果这种就很方便了..

from xx import get_bangumi
class Script(xx):
    ...
    def get_bangumi_data(x):
         return get_bangumi(x)

之类的..

trim21 commented 7 years ago

之前改改改把fetch.py 最后改成了这样..好像跟你的想法差不多? 在配置项里加入了WEBSITE_NAME 默认为bangumi_moe

# coding=utf-8
from __future__ import print_function, unicode_literals

from bgmi.config import WEBSITE_NAME

from bgmi.website.bangumimoe import BangumiMoe
from bgmi.website.mikan import Mikanani

if WEBSITE_NAME == 'mikan_project':
    website = Mikanani()
else:
    website = BangumiMoe()

trim21 commented 7 years ago

bangumimoe.py 现在要添加一个数据源只需要从bgmi.website.base 引入BaseWebsite,然后实现三个方法 filter,存储数据之类的都放在了BaseWebsite里面在main.py里面添加了几行代码,在第一次启动时选择数据源...

from bgmi.website.base import BaseWebsite

class BangumiMoe(BaseWebsite):
    cover_url = COVER_URL

    def search_by_keyword(self, keyword, count):
        """
        return a list of dict with at least 4 key: download, name, title, episode
        example:

        [
            {
                'name':"路人女主的养成方法",
                'download': 'magnet:?xt=urn:btih:what ever',
                'title': "[澄空学园] 路人女主的养成方法 第12话 MP4 720p  完",
                'episode': 12
            },
        ]
    ```
    :param keyword: search key word
    :type keyword: str
    :param count: how many page to fetch from website
    :type count: int
    :return: list of episode search result
    :rtype: list[dict]
    """
    return []

def fetch_episode_of_bangumi(self, bangumi_id, subtitle_list=None, max_page=MAX_PAGE):
    """
    get all episode by bangumi id
    example
    ```
        [
            {
                "download": "magnet:?xt=urn:btih:e43b3b6b53dd9fd6af1199e112d3c7ff15cab82c",
                "name": "来自深渊",
                "subtitle_group": "58a9c1c9f5dc363606ab42ec",
                "title": "【喵萌奶茶屋】★七月新番★[来自深渊/Made in Abyss][07][GB][720P]",
                "episode": 0,
                "time": 1503301292
            },
        ]
    ```
    :param bangumi_id: bangumi_id
    :param subtitle_list: list of subtitle group
    :type subtitle_list: list
    :param max_page: how many page you want to crawl if there is no subtitle list
    :type max_page: int
    :return: list of bangumi
    :rtype: list[dict]
    """
    return []

def fetch_bangumi_calendar_and_subtitle_group(self):
    """
    return a list of all bangumi and a list of all subtitle group

    bangumi dict:
    update time should be one of ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
    example:
    ```
        [
            {
                "status": 0,
                "subtitle_group": [
                    "123",
                    "456"
                ],
                "name": "名侦探柯南",
                "keyword": "1234", #bangumi id
                "update_time": "Sat",
                "cover": "data/images/cover1.jpg"
            },
        ]
    ```

    subtitle group dict:
    example:
    ```
        [
            {
                'id': '233',
                'name': 'bgmi字幕组'
            }
        ]
    ```

    :return: list of bangumi, list of subtitile group
    :rtype: (list[dict], list[dict])
    """

    return [], []

RicterZ commented 7 years ago

seems good

Trim21 notifications@github.com于2017年8月26日周六上午1:28写道：

bangumimoe.py 现在要添加一个数据源只需要从bgmi.website.base 引入BaseWebsite,然后实现三个方法 filter之类的放在BaseWebsite里面

from bgmi.website.base import BaseWebsite

class BangumiMoe(BaseWebsite): cover_url = COVER_URL

def search_by_keyword(self, keyword, count):
    """
    return a list of dict with at least 4 key: download, name, title, episode
    example:
    ```
        [
            {
                'name':"路人女主的养成方法",
                'download': 'magnet:?xt=urn:btih:what ever',
                'title': "[澄空学园] 路人女主的养成方法 第12话 MP4 720p  完",
                'episode': 12
            },
        ]
    ```
    :param keyword: search key word
    :type keyword: str
    :param count: how many page to fetch from website
    :type count: int
    :return: list of episode search result
    :rtype: list[dict]
    """
    return []

def fetch_episode_of_bangumi(self, bangumi_id, subtitle_list=None, max_page=MAX_PAGE):
    """
    get all episode by bangumi id
    example
    ```
        [
            {
                "download": "magnet:?xt=urn:btih:e43b3b6b53dd9fd6af1199e112d3c7ff15cab82c",
                "name": "来自深渊",
                "subtitle_group": "58a9c1c9f5dc363606ab42ec",
                "title": "【喵萌奶茶屋】★七月新番★[来自深渊/Made in Abyss][07][GB][720P]",
                "episode": 0,
                "time": 1503301292
            },
        ]
    ```
    :param bangumi_id: bangumi_id
    :param subtitle_list: list of subtitle group
    :type subtitle_list: list
    :param max_page: how many page you want to crawl if there is no subtitle list
    :type max_page: int
    :return: list of bangumi
    :rtype: list[dict]
    """
    return []

def fetch_bangumi_calendar_and_subtitle_group(self):
    """
    return a list of all bangumi
    update time should be one of ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
    example:
    ```
        [
            {
                "status": 0,
                "subtitle_group": [
                    "123",
                    "456"
                ],
                "name": "名侦探柯南",
                "keyword": "1234", #bangumi id
                "update_time": "Sat",
                "cover": "data/images/cover1.jpg"
            },
        ]
    ```
    :return: list of bangumi
    :rtype: list[dict]
    """

    return [], []

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RicterZ/BGmi/issues/74#issuecomment-324986149, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCbxy_84L3OwJv8yNWD2Tn0i1XQ4UC5ks5sbwRDgaJpZM4OPrgJ .

RicterZ commented 7 years ago

README 加一下 datasource 的配置?

trim21 commented 7 years ago

我在readme加过了...

Additional config

DATA_SOURCE: data source now support bangumi_moe`(default) and :code:`mikan_project

trim21 commented 7 years ago

刚发现parse_episode出bug了..修复中..

RicterZ commented 7 years ago

不慌 Trim21 notifications@github.com于2017年8月29日周二上午12:32写道：

刚发现parse_episode出bug了..修复中..

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RicterZ/BGmi/issues/74#issuecomment-325404440, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCbx5D-CM_InH9oijzm4aY7c1xqIXIcks5scuuVgaJpZM4OPrgJ .

RicterZ / BGmi

准备提交一个pr,添加蜜柑计划做为数据来源 #74