Rock-Candy-Tea / hexo-circle-of-friends

Python gets the friend's articles from hexo's friend-links
Apache License 2.0
280 stars 529 forks source link

Server部署报错 #25

Closed Nesxc closed 2 years ago

Nesxc commented 2 years ago

ubuntu 20.04.3 LTS x64 Python 3.8.10 MySQL 5.7.34

报错显示:

[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)
2022-03-19 02:06:14 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://hesifan.top/atom.xml> (failed 3 times): User timeout caused connection failure: Getting https://hesifan.top/atom.xml took longer than 15.0 seconds..
2022-03-19 02:06:18 [scrapy.core.scraper] ERROR: Error processing {'author': "Haobo's Blog", 'avatar': 'https://img.cdn.nesxc.com/2022/02/202202052207248webp', 'rule': 'atom10', 'title': '【数学】到底什么是信息论 施工中~', 'created': '2022-02-25', 'updated': '2022-02-25', 'link': 'https://discover304.top/2022/02/25/2022q1/144-information-theory/'}
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 150, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 73, in process_item
    self.friendpoor_push(item)
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 153, in friendpoor_push
    self.session.commit()
  File "<string>", line 2, in commit
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1431, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 827, in commit
    self._assert_active(prepared_ok=True)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
    raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (pymysql.err.DataError) (1366, "Incorrect string value: '\\xF0\\x9F\\x92\\xAA' for column 'title' at row 1")
[SQL: INSERT INTO posts (title, created, updated, link, author, avatar, rule, `createAt`) VALUES (%(title)s, %(created)s, %(updated)s, %(link)s, %(author)s, %(avatar)s, %(rule)s, %(createAt)s)]
[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)
2022-03-19 02:06:18 [scrapy.core.engine] ERROR: Scraper close failure
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 81, in close_spider
    self.friendlist_push()
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 140, in friendlist_push
    self.session.commit()
  File "<string>", line 2, in commit
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1431, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 827, in commit
    self._assert_active(prepared_ok=True)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
    raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (pymysql.err.DataError) (1366, "Incorrect string value: '\\xF0\\x9F\\x92\\xAA' for column 'title' at row 1")
[SQL: INSERT INTO posts (title, created, updated, link, author, avatar, rule, `createAt`) VALUES (%(title)s, %(created)s, %(updated)s, %(link)s, %(author)s, %(avatar)s, %(rule)s, %(createAt)s)]
[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)
Nesxc commented 2 years ago

settings.py:

################################请修改以下内容################################
# 友链页地址
# 参数说明:
# link:必填,在这里填写你的友链页面地址
# theme:必填,友链页的获取策略。需要指定该页面的主题,可选参数如下(这些是目前支持的主题):
#   - common: 通用主题,请参考:https://fcircle-doc.js.cool/#/developmentdoc?id=友链页适配
#   - butterfly:butterfly主题
#   - fluid:fluid主题
#   - matery:matery主题
#   - nexmoe:nexmoe主题
#   - stun:stun主题
#   - sakura: sakura主题
#   - volantis:volantis主题
#   - Yun:Yun主题
#   - stellar:stellar主题
# 支持配置多个友链页面并指定不同主题策略,每个用{}分隔,它们会被同时爬取,数据保存在一起。***至少配置一个***
LINK = [
    {
        "link": "https://www.nesxc.com/link/",  # 友链页地址1,修改为你的友链页地址
        "theme": "common"
    },
    #     {
    #     "link": "https://noionion.top/link/",  # 友链页地址2
    #     "theme": "butterfly",  # 友链页的获取策略
    # },
    #     {
    #     "link": "https://immmmm.com/about/",  # 友链页地址3
    #     "theme": "common",  # 友链页的获取策略
    # }

]

# 配置项友链
# enable:# 是否启用配置项友链 True/False(针对还未适配主题或者有定制需求的用户)
# json_api:通过api获取配置项友链,返回格式必须为:{"friends":[[友链1],[友链2],[友链3],[友链4]....]},友链内容同list字段格式
# list字段填写格式:["name", "link", "avatar","suffix"],其中:
#       name:必填,友链的名字
#       link:必填,友链主页地址
#       avatar:必填,头像地址
#       suffix:选填,自定义订阅后缀,主要针对不规范的网站订阅后缀,见示例2
SETTINGS_FRIENDS_LINKS = {
    "enable": True,
    "json_api": "",
    "list": [
        ["小N同学", "https://www.nesxc.com", "https://img.cdn.nesxc.com/upload/wordpress/f3ccdd27d200-1.jpg"],
        ["2BROEAR", "https://blog.2broear.com", "https://img.cdn.nesxc.com/2022/01/202201302330071.png"],
        ["Adil", "https://blog.adil.com.cn", "https://img.cdn.nesxc.com/2022/02/202202052139016webp"],
        ["Akilarの糖果屋", "https://akilar.top", "https://img.cdn.nesxc.com/2022/01/202201302317352.png"],
        ["Android", "https://android99.me", "https://img.cdn.nesxc.com/2022/02/202202052128963webp"],
        ["CC的部落格", "https://blog.ccknbc.cc", "https://img.cdn.nesxc.com/2022/01/202201302337383.png"],
        ["Codeanime", "https://codeanime.cc", "https://img.cdn.nesxc.com/2022/02/202202052157089webp"],
        ["Dragon犬’s blog", "https://blog.furrysp.top", "https://img.cdn.nesxc.com/2022/02/202202052148062webp"],
        ["Dreamy.Xiam'say Blog", "https://blog.dreamyxiam.xyz", "https://img.cdn.nesxc.com/2022/02/202202060712146webp"],
        ["Ethan.Tzy", "https://tzy1997.com", "https://img.cdn.nesxc.com/2022/02/202202232028515webp"],
        ["ethanyi", "https://ethanyi9.gitee.io", "https://img.cdn.nesxc.com/2022/02/202202052140634webp"],
        ["Haobo's Blog", "https://discover304.top", "https://img.cdn.nesxc.com/2022/02/202202052207248webp"],
        ["Heo", "https://blog.zhheo.com", "https://img.cdn.nesxc.com/2022/03/1646950385285-20220311061303.webp"],
        ["Heyiki’Bolg", "https://heyiki.top", "https://img.cdn.nesxc.com/2022/02/202202052142686webp"],
        ["iMaeGoo’s Blog", "https://imaegoo.com", "https://img.cdn.nesxc.com/2022/02/202202052052603.png"],
        ["Internet Bug's blog", "https://myhosts.site", "https://img.cdn.nesxc.com/2022/02/202202141030024webp"],
        ["itsNekoDeng", "https://dyfa.top", "https://img.cdn.nesxc.com/2022/01/202201302333212.png"],
        ["Jasonの小窝", "https://blog.catrol.cn", "https://img.cdn.nesxc.com/2022/02/202202052150175webp"],
        ["Lete乐特", "https://blog.lete114.top/", "https://img.cdn.nesxc.com/2022/01/202201302314242.png"],
        ["OY", "https://oy6090.top", "https://img.cdn.nesxc.com/2022/02/202202052135983webp"],
        ["PT的小破站", "https://sqdpt.top", "https://img.cdn.nesxc.com/2022/02/202202052154989webp"],
        ["Qingxu", "https://blog.linioi.com", "https://img.cdn.nesxc.com/2022/02/202202052132797webp"],
        ["Revincx", "https://blog.revincx.icu", "https://img.cdn.nesxc.com/2022/02/202202061028615webp"],
        ["Sady'Blog", "https://sady0.com", "https://img.cdn.nesxc.com/2022/03/1646832464083-20220309212742.webp"],
        ["Seeker", "https://snow.js.org", "https://img.cdn.nesxc.com/2022/02/202202052117350webp"],
        ["starsのblog", "https://blog.cnortles.top", "https://img.cdn.nesxc.com/2022/02/202202231433081webp"],
        ["Throwable", "https://throwx.cn", "https://img.cdn.nesxc.com/2022/02/202202052201827webp"],
        ["WUMOER", "https://wumoer.com", "https://img.cdn.nesxc.com/2022/02/202202052127764webp"],
        ["wxydejoy", "https://c.undf.top", "https://img.cdn.nesxc.com/2022/01/202201302337548.png"],
        ["Xc's Blog", "https://6ing.xyz", "https://img.cdn.nesxc.com/2022/02/202202052203103webp"],
        ["Zane Liu", "https://  lza59.com", "https://img.cdn.nesxc.com/2022/02/202202052208104webp"],
        ["ZHIHUIのBLONG", "https://hinuohui.com", "https://img.cdn.nesxc.com/2022/02/202202120506100webp"],
        ["凡尘纪", "https://hesifan.top", "https://img.cdn.nesxc.com/2022/02/202202052053092.png"],
        ["十玖八柒", "https://ahzoo.cn", "https://img.cdn.nesxc.com/2022/02/202202231434355webp"],
        ["卓越科技的Blog", "https://zykj.js.org", "https://img.cdn.nesxc.com/2022/02/202202052133081webp"],
        ["呆逼の博客", "https://blog.keepdai.cn", "https://img.cdn.nesxc.com/2022/02/202202052155526webp"],
        ["哀殿first", "https://aidianfirst.top", "https://img.cdn.nesxc.com/2022/02/202202052145343webp"],
        ["墨初博客", "https://mochu.co", "https://img.cdn.nesxc.com/2022/02/202202140355129webp"],
        ["天昕", "https://sutianxin.top", "https://img.cdn.nesxc.com/2022/02/202202052136493webp"],
        ["小冰博客", "https://zfe.space", "https://img.cdn.nesxc.com/2022/01/202201302318766.png"],
        ["小嘉的部落格", "https://blog.imzjw.cn", "https://img.cdn.nesxc.com/2022/02/202202052131354webp"],
        ["小孙同学", "https://sunguoqi.com", "https://img.cdn.nesxc.com/2022/02/202202052201529webp"],
        ["小康博客", "https://antmoe.com", "https://img.cdn.nesxc.com/2022/01/202201311858656.png"],
        ["小胖墩er", "https://chubbyduner.top", "https://img.cdn.nesxc.com/2022/02/202202052202482webp"],
        ["小飞博客", "https://xffjs.com", "https://img.cdn.nesxc.com/2022/02/202202251701039webp"],
        ["常青园晚", "https://blog.catrol.cn", "https://img.cdn.nesxc.com/2022/02/202202052150596webp"],
        ["御网尚书", "https://hack-gov.com.cn", "https://img.cdn.nesxc.com/2022/01/202201302324538.png"],
        ["忽然笔记", "https://blog.huran.xyz", "https://img.cdn.nesxc.com/2022/02/202202241406955webp"],
        ["林木木", "https://immmmm.com", "https://img.cdn.nesxc.com/2022/02/202202061022473webp"],
        ["流浪银河", "https://zero-pointer.com", "https://img.cdn.nesxc.com/2022/02/202202052203328webp"],
        ["灰鸿的空间", "https://space.greyh.cn", "https://img.cdn.nesxc.com/2022/02/202202052149194webp"],
        ["皮皮凛の小窝", "https://owomoe.net", "https://img.cdn.nesxc.com/2022/02/202202052125775webp"],
        ["笑笑的博客", "https://xiaoxiao-love.gitee.io", "https://img.cdn.nesxc.com/2022/02/202202052144600webp"],
        ["花猪のBlog", "https://cnhuazhu.top", "https://img.cdn.nesxc.com/2022/02/202202052136912webp"],
        ["葱苓的小窝", "https://www.itciraos.cn", "https://img.cdn.nesxc.com/2022/02/202202231617872webp"],
        ["虫不知喔", "https://blog.ssykawa.com", "https://img.cdn.nesxc.com/2022/03/202203071216447webp"],
        ["超逸の技术博客", "https://yangchaoyi.vip", "https://img.cdn.nesxc.com/2022/01/202201302335743.png"],
        ["陈YF的博客", "https://blog.cyfan.top", "https://img.cdn.nesxc.com/2022/02/202202061014976webp"],
        ["飞鸟", "https://lzxjack.top", "https://img.cdn.nesxc.com/2022/02/202202052142630web"],
        ["FiveFireX的博客", "https://fivefirex.github.io/", "https://img.cdn.nesxc.com/2022/03/1647358146995-20220315232905.webp"],
        ["JIPA233の小窝", "https://www.jipa.work", "https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp"],
        ["赤蓝紫", "https://clz.vercel.app/", "https://img.cdn.nesxc.com/2022/03/1647358340804-20220315233219.webp"],
        ["LanYunのBlog", "https://lanyundev.vercel.app/", "https://img.cdn.nesxc.com/2022/03/1647358382860-20220315233301.webp"],
    ]
}

# get links from gitee
# 从gitee issue中获取友链
GITEE_FRIENDS_LINKS = {
    "enable": False,  # True 开启gitee issue兼容
    "type": "normal",  # volantis/stellar用户请在这里填写volantis
    "owner": "ccknbc",  # 填写你的gitee用户名
    "repo": "blogroll",  # 填写你的gitee仓库名
    "state": "open"  # 填写抓取的issue状态(open/closed)
}

# get links from github
# 从github issue中获取友链
GITHUB_FRIENDS_LINKS = {
    "enable": False,  # True 开启github issue兼容
    "type": "normal",  # volantis/stellar用户请在这里填写volantis
    "owner": "ccknbc",  # 填写你的github用户名
    "repo": "ccknbc-actions",  # 填写你的github仓库名
    "state": "open"  # 填写抓取的issue状态(open/closed)
}

# block site list
# 添加屏蔽站点
BLOCK_SITE = [
    # "https://example.com/",
    # "https://example.com/",
]

# 启用HTTP代理,此项设为True,并且需要添加一个环境变量,名称为PROXY,值为[IP]:[端口],比如:192.168.1.106:8080
HTTP_PROXY = False

# 过期文章清除(天)
OUTDATE_CLEAN = 60

# 存储方式,可选项:leancloud,mysql,sqlite,mongodb;默认为leancloud
DATABASE = "mysql"

# 部署方式,可选项:github,server,docker;默认为github
DEPLOY_TYPE = "server"

################################请修改以上内容################################:

##############################除非您了解本项目,否则请勿修改以下内容################################

VERSION = "4.3.1"

# debug
# debug模式
DEBUG = False

# lc
# debug模式使用

#LC_APPID = "MTXYmy79JiLLO9VafgeAn8A-MdYXbMMI"
#LC_APPKEY = "08N7lfcelf7Lkpy7Wp9amsiA"

# proxy
# HTTP_PROXY_URL = "192.168.1.106:10809"
HTTP_PROXY_URL = ""

# debug blog link url
# debug模式使用

# https://yun.yunyoujun.cn/demo/ , Yun
# FRIENDPAGE_LINK = [
#     "https://www.yyyzyyyz.cn/link/",  # butterfly
#     "https://akilar.top/link/",  # butterfly
#     "https://www.zyoushuo.cn/friends/",  # volantis
# ]
#FRIENDPAGE_LINK = ["https://www.yyyzyyyz.cn/link/"]

BOT_NAME = 'hexo_circle_of_friends'
LOG_LEVEL = "ERROR"
SPIDER_MODULES = ['hexo_circle_of_friends.spiders']
NEWSPIDER_MODULE = 'hexo_circle_of_friends.spiders'
USER_AGENT_LIST = [
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
    "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
    "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",
    "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",
    "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",
    "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",
    "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",
    "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",
    "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",
    "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",
    "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
    "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)",
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
    "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",
    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
    "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
]
ROBOTSTXT_OBEY = False
CONCURRENT_REQUESTS = 128
DOWNLOAD_TIMEOUT = 15
COOKIES_ENABLED = False
DOWNLOADER_MIDDLEWARES = {
    # 'hexo_circle_of_friends.middlewares.HexoCircleOfFriendsDownloaderMiddleware': 543,
    'hexo_circle_of_friends.middlewares.RandomUserAgentMiddleware': 400,
    'hexo_circle_of_friends.middlewares.BlockSiteMiddleware': 300,
    'hexo_circle_of_friends.middlewares.ProxyMiddleware': 299,

}

ITEM_PIPELINES = {
    'hexo_circle_of_friends.pipelines.pipelines.DuplicatesPipeline': 200,
}

RETRY_ENABLED = True
Nesxc commented 2 years ago

完整的crawler.log https://pan.nesxc.com/s/GLua 使用的项目文件 https://pan.nesxc.com/s/e0tX

hiltay commented 2 years ago

ubuntu 20.04.3 LTS x64 Python 3.8.10 MySQL 5.7.34

报错显示:

[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)
2022-03-19 02:06:14 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://hesifan.top/atom.xml> (failed 3 times): User timeout caused connection failure: Getting https://hesifan.top/atom.xml took longer than 15.0 seconds..
2022-03-19 02:06:18 [scrapy.core.scraper] ERROR: Error processing {'author': "Haobo's Blog", 'avatar': 'https://img.cdn.nesxc.com/2022/02/202202052207248webp', 'rule': 'atom10', 'title': '【数学】到底什么是信息论 施工中~', 'created': '2022-02-25', 'updated': '2022-02-25', 'link': 'https://discover304.top/2022/02/25/2022q1/144-information-theory/'}
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 150, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 73, in process_item
    self.friendpoor_push(item)
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 153, in friendpoor_push
    self.session.commit()
  File "<string>", line 2, in commit
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1431, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 827, in commit
    self._assert_active(prepared_ok=True)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
    raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (pymysql.err.DataError) (1366, "Incorrect string value: '\\xF0\\x9F\\x92\\xAA' for column 'title' at row 1")
[SQL: INSERT INTO posts (title, created, updated, link, author, avatar, rule, `createAt`) VALUES (%(title)s, %(created)s, %(updated)s, %(link)s, %(author)s, %(avatar)s, %(rule)s, %(createAt)s)]
[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)
2022-03-19 02:06:18 [scrapy.core.engine] ERROR: Scraper close failure
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 81, in close_spider
    self.friendlist_push()
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 140, in friendlist_push
    self.session.commit()
  File "<string>", line 2, in commit
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1431, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 827, in commit
    self._assert_active(prepared_ok=True)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
    raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (pymysql.err.DataError) (1366, "Incorrect string value: '\\xF0\\x9F\\x92\\xAA' for column 'title' at row 1")
[SQL: INSERT INTO posts (title, created, updated, link, author, avatar, rule, `createAt`) VALUES (%(title)s, %(created)s, %(updated)s, %(link)s, %(author)s, %(avatar)s, %(rule)s, %(createAt)s)]
[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)

数据库编码问题,由于title中出现了'💪'emoji表情,数据库改用uft8mb4字符集再次运行即可。

hiltay commented 2 years ago

已经在文档中进行说明~