RicterZ / BGmi

BGmi is a cli tool for subscribed bangumi.
https://bgmi.ricterz.me
387 stars 72 forks source link

准备提交一个pr,添加蜜柑计划做为数据来源 #74

Closed trim21 closed 7 years ago

trim21 commented 7 years ago

昨天在添加恋爱禁止的世界的时候,实际抓回来的是捏造陷阱NTR. 最主要的是到现在也没有new game.bangumi.moe那边的数据准确性好像有点低.似乎是自动识别加tag的

蜜柑计划 http://mikanani.me/的数据准确度比较高,本身就做好了番剧和字幕组的区分. 准备提交个pr,加一个数据来源,也从那边抓数据过来.

没有new game看我要死了

RicterZ commented 7 years ago

new game会有的,不过需要等等…

恋爱禁止世界回来的NTR怕不是字幕组发布的时候出现的锅,上游数据问题我也无法( Trim21 notifications@github.com于2017年7月6日 周四下午10:02写道:

昨天在添加恋爱禁止的世界的时候,实际抓回来的是捏造陷阱NTR. 最主要的是到现在也没有new game.bangumi.moe那边的数据准确性好像有点低.似乎是自动识别加tag的

蜜柑计划 http://mikanani.me/ http://mikanani.me的数据准确度比较高,本身就做好了番剧和字幕组的区分. 准备提交个pr,加一个数据来源,也从那边抓数据过来.

没有new game看我要死了

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RicterZ/BGmi/issues/74, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCbx5zxBxoRHJHB8JbR-JX5H5B8KBf0ks5sLOj_gaJpZM4OPrgJ .

trim21 commented 7 years ago

我昨天本来想提issue的,然后发现其实是上游数据的问题。所以准备自己动手添加数据源。这个issue主要是想问一下你是否介意,以及在完成之后是愿意合并。以及如果愿意的话有没有什么实现方法上介意的地方,比如介意添加依赖之类的(

RicterZ commented 7 years ago

暂时没有换数据源的想法,其实就是改fetch.py,其他都基本不动…

我想想要不要加一个接口可以自己实现解析数据源这样,可以比较容易的扩展切换。

上游数据问题我也没啥解决办法,只能求各位字幕组大爷别出错,然后默默加一个 filter…

如果你要想添加的话可以起一个 fetch_xxx.py,默认不启用,可以手动切换(mv 到 fetch.py),接口遵循好。

感觉坑也是多,可能数据结构会改变所以数据库也有相应变化。

Trim21 notifications@github.com于2017年7月6日 周四下午10:20写道:

我昨天本来想提issue的,然后发现其实是上游数据的问题。所以准备自己动手添加数据源。这个issue主要是想问一下你是否介意,以及在完成之后是愿意合并。以及如果愿意的话有没有什么实现方法上介意的地方,比如介意添加依赖之类的(

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RicterZ/BGmi/issues/74#issuecomment-313410315, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCbxxRu9eLAP03Zq2BcZ4wlwEJsfpiTks5sLO03gaJpZM4OPrgJ .

w3eee commented 7 years ago

搭车提个疑问 订阅的是怎么把番组和对应的种子文件对应起来的 仅仅是名称的比对么? 但是有一些种子的命名不规范怎么办

RicterZ commented 7 years ago

有 parser 的 Wee notifications@github.com于2017年7月8日 周六上午11:56写道:

搭车提个疑问 订阅的是怎么把番组和对应的种子文件对应起来的 仅仅是名称的比对么? 但是有一些种子的命名不规范怎么办

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RicterZ/BGmi/issues/74#issuecomment-313831499, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCbx_RqSOQIEOm9RQWuJmUZSq_yn0iuks5sLv3TgaJpZM4OPrgJ .

trim21 commented 7 years ago

重写了fetch.py 把从数据源获取数据抽象成了三个方法

class BangumiMoe(BaseWebsite):
    cover_url=''
    def search_by_keyword(self, keyword, count):
        return []

    def fetch_bangumi_calendar_and_subtitle_group(self):
        return [], []

    def fetch_episode_of_bangumi(self, bangumi_id, subtitle_list=None, max_page=MAX_PAGE):
        return []

如果要修改数据源的话重写这三个方法就可以了.. 使用过程中不能更换数据源.

改动有些大,感觉好像跟script.py的作用部分重叠了....

RicterZ commented 7 years ago

emm,bgmi script 我打算添加一个自定义 model 的功能,还在构思。 目前的想法是你的蜜柑可以作为一个 api,script 可以传入参数调用就能获取结果这种就很方便了..

from xx import get_bangumi
class Script(xx):
    ...
    def get_bangumi_data(x):
         return get_bangumi(x)

之类的..

trim21 commented 7 years ago

之前改改改把fetch.py 最后改成了这样..好像跟你的想法差不多? 在配置项里加入了WEBSITE_NAME 默认为bangumi_moe

# coding=utf-8
from __future__ import print_function, unicode_literals

from bgmi.config import WEBSITE_NAME

from bgmi.website.bangumimoe import BangumiMoe
from bgmi.website.mikan import Mikanani

if WEBSITE_NAME == 'mikan_project':
    website = Mikanani()
else:
    website = BangumiMoe()
trim21 commented 7 years ago

bangumimoe.py 现在要添加一个数据源只需要从bgmi.website.base 引入BaseWebsite,然后实现三个方法 filter,存储数据之类的都放在了BaseWebsite里面 在main.py里面添加了几行代码,在第一次启动时选择数据源...

from bgmi.website.base import BaseWebsite

class BangumiMoe(BaseWebsite):
    cover_url = COVER_URL

    def search_by_keyword(self, keyword, count):
        """
        return a list of dict with at least 4 key: download, name, title, episode
        example:
        [
            {
                'name':"路人女主的养成方法",
                'download': 'magnet:?xt=urn:btih:what ever',
                'title': "[澄空学园] 路人女主的养成方法 第12话 MP4 720p  完",
                'episode': 12
            },
        ]
    ```
    :param keyword: search key word
    :type keyword: str
    :param count: how many page to fetch from website
    :type count: int
    :return: list of episode search result
    :rtype: list[dict]
    """
    return []

def fetch_episode_of_bangumi(self, bangumi_id, subtitle_list=None, max_page=MAX_PAGE):
    """
    get all episode by bangumi id
    example
    ```
        [
            {
                "download": "magnet:?xt=urn:btih:e43b3b6b53dd9fd6af1199e112d3c7ff15cab82c",
                "name": "来自深渊",
                "subtitle_group": "58a9c1c9f5dc363606ab42ec",
                "title": "【喵萌奶茶屋】★七月新番★[来自深渊/Made in Abyss][07][GB][720P]",
                "episode": 0,
                "time": 1503301292
            },
        ]
    ```
    :param bangumi_id: bangumi_id
    :param subtitle_list: list of subtitle group
    :type subtitle_list: list
    :param max_page: how many page you want to crawl if there is no subtitle list
    :type max_page: int
    :return: list of bangumi
    :rtype: list[dict]
    """
    return []

def fetch_bangumi_calendar_and_subtitle_group(self):
    """
    return a list of all bangumi and a list of all subtitle group

    bangumi dict:
    update time should be one of ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
    example:
    ```
        [
            {
                "status": 0,
                "subtitle_group": [
                    "123",
                    "456"
                ],
                "name": "名侦探柯南",
                "keyword": "1234", #bangumi id
                "update_time": "Sat",
                "cover": "data/images/cover1.jpg"
            },
        ]
    ```

    subtitle group dict:
    example:
    ```
        [
            {
                'id': '233',
                'name': 'bgmi字幕组'
            }
        ]
    ```

    :return: list of bangumi, list of subtitile group
    :rtype: (list[dict], list[dict])
    """

    return [], []
RicterZ commented 7 years ago

seems good

Trim21 notifications@github.com于2017年8月26日周六 上午1:28写道:

bangumimoe.py 现在要添加一个数据源只需要从bgmi.website.base 引入BaseWebsite,然后实现三个方法 filter之类的放在BaseWebsite里面

from bgmi.website.base import BaseWebsite

class BangumiMoe(BaseWebsite): cover_url = COVER_URL

def search_by_keyword(self, keyword, count):
    """
    return a list of dict with at least 4 key: download, name, title, episode
    example:
    ```
        [
            {
                'name':"路人女主的养成方法",
                'download': 'magnet:?xt=urn:btih:what ever',
                'title': "[澄空学园] 路人女主的养成方法 第12话 MP4 720p  完",
                'episode': 12
            },
        ]
    ```
    :param keyword: search key word
    :type keyword: str
    :param count: how many page to fetch from website
    :type count: int
    :return: list of episode search result
    :rtype: list[dict]
    """
    return []

def fetch_episode_of_bangumi(self, bangumi_id, subtitle_list=None, max_page=MAX_PAGE):
    """
    get all episode by bangumi id
    example
    ```
        [
            {
                "download": "magnet:?xt=urn:btih:e43b3b6b53dd9fd6af1199e112d3c7ff15cab82c",
                "name": "来自深渊",
                "subtitle_group": "58a9c1c9f5dc363606ab42ec",
                "title": "【喵萌奶茶屋】★七月新番★[来自深渊/Made in Abyss][07][GB][720P]",
                "episode": 0,
                "time": 1503301292
            },
        ]
    ```
    :param bangumi_id: bangumi_id
    :param subtitle_list: list of subtitle group
    :type subtitle_list: list
    :param max_page: how many page you want to crawl if there is no subtitle list
    :type max_page: int
    :return: list of bangumi
    :rtype: list[dict]
    """
    return []

def fetch_bangumi_calendar_and_subtitle_group(self):
    """
    return a list of all bangumi
    update time should be one of ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
    example:
    ```
        [
            {
                "status": 0,
                "subtitle_group": [
                    "123",
                    "456"
                ],
                "name": "名侦探柯南",
                "keyword": "1234", #bangumi id
                "update_time": "Sat",
                "cover": "data/images/cover1.jpg"
            },
        ]
    ```
    :return: list of bangumi
    :rtype: list[dict]
    """

    return [], []

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RicterZ/BGmi/issues/74#issuecomment-324986149, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCbxy_84L3OwJv8yNWD2Tn0i1XQ4UC5ks5sbwRDgaJpZM4OPrgJ .

RicterZ commented 7 years ago

README 加一下 datasource 的配置?

trim21 commented 7 years ago

我在readme加过了...

Additional config

DATA_SOURCE: data source now support bangumi_moe`(default) and :code:`mikan_project
trim21 commented 7 years ago

刚发现parse_episode出bug了..修复中..

RicterZ commented 7 years ago

不慌 Trim21 notifications@github.com于2017年8月29日 周二上午12:32写道:

刚发现parse_episode出bug了..修复中..

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RicterZ/BGmi/issues/74#issuecomment-325404440, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCbx5D-CM_InH9oijzm4aY7c1xqIXIcks5scuuVgaJpZM4OPrgJ .