Thisisnotgoingpublished commented 9 months ago

运行报错： UnicodeEncodeError: 'gbk' codec can't encode character '\U0001f525' in position 400: illegal multibyte sequence

处理字符时遇到了 Unicode 编码问题，'gbk' 编码不支持。字符 '\U0001f525' 是🔥表情符号。

Thisisnotgoingpublished commented 9 months ago

安装前调用emoji

pip install emoji

然后把 .\weibo\spiders\search.py 前面加入 import emoji

然后#掉倒数第二行 print(weibo) 改为下方内容

text_to_demj = weibo.get('text', '') clean_text = emoji.demojize(text_to_demj) print(clean_text)

Thisisnotgoingpublished commented 9 months ago

或者我不知道应该怎么写变成原来的输出 我不会编程 希望作者注意一下我作为小白觉得应该把所有文本当做utf-8或者gbk，这样半落砢矶的不太好

Thisisnotgoingpublished commented 9 months ago

不行我无法了它还是在报错

PS H:\weibo-search-master\weibo> scrapy crawl search -s JOBDIR=crawls/search >> ./a.txt 2023-12-13 11:27:43 [scrapy.core.scraper] ERROR: Spider error processing <GET https://s.weibo.com/weibo?q=<保密>&typeall=1&suball=1&timescope=custom:2023-12-12-0:2023-12-13-0&page=1> (referer: https://s.weibo.com/weibo?q=<保密>&typeall=1&suball=1&timescope=custom:2023-12-11-0:2023-12-14-0) Traceback (most recent call last): File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\defer.py", line 279, in iter_errback yield next(it) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\python.py", line 350, in next return next(self.data) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\python.py", line 350, in next return next(self.data) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 28, in return (r for r in result or () if self._filter(r, spider)) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 352, in return (self._set_referer(r, response) for r in result or ()) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 27, in return (r for r in result or () if self._filter(r, spider)) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 31, in return (r for r in result or () if self._filter(r, response, spider)) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "H:\weibo-search-master\weibo\spiders\search.py", line 154, in parse_by_day for weibo in self.parse_weibo(response): File "H:\weibo-search-master\weibo\spiders\search.py", line 542, in parse_weibo print(clean_text) UnicodeEncodeError: 'gbk' codec can't encode character '\ufffc' in position 58: illegal multibyte sequence 2023-12-13 11:27:54 [scrapy.core.scraper] ERROR: Spider error processing <GET https://s.weibo.com/weibo?q=<保密>&typeall=1&suball=1&timescope=custom:2023-12-11-0:2023-12-12-0&page=1> (referer: https://s.weibo.com/weibo?q=<保密>&typeall=1&suball=1&timescope=custom:2023-12-11-0:2023-12-14-0) Traceback (most recent call last): File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\defer.py", line 279, in iter_errback yield next(it) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\python.py", line 350, in next return next(self.data) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\python.py", line 350, in next return next(self.data) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 28, in return (r for r in result or () if self._filter(r, spider)) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 352, in return (self._set_referer(r, response) for r in result or ()) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 27, in return (r for r in result or () if self._filter(r, spider)) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 31, in return (r for r in result or () if self._filter(r, response, spider)) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "H:\weibo-search-master\weibo\spiders\search.py", line 154, in parse_by_day for weibo in self.parse_weibo(response): File "H:\weibo-search-master\weibo\spiders\search.py", line 542, in parse_weibo print(clean_text) UnicodeEncodeError: 'gbk' codec can't encode character '\ue662' in position 11: illegal multibyte sequence 2023-12-13 11:29:44 [scrapy.core.scraper] ERROR: Spider error processing <GET https://s.weibo.com/weibo?q=<保密>&typeall=1&suball=1&timescope=custom:2023-12-11-0:2023-12-14-0> (referer: None) Traceback (most recent call last): File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\defer.py", line 279, in iter_errback yield next(it) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\python.py", line 350, in next return next(self.data) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\python.py", line 350, in next return next(self.data) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 28, in return (r for r in result or () if self._filter(r, spider)) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 352, in return (self._set_referer(r, response) for r in result or ()) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 27, in return (r for r in result or () if self._filter(r, spider)) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 31, in return (r for r in result or () if self._filter(r, response, spider)) File "C:\Users\<保密>\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "H:\weibo-search-master\weibo\spiders\search.py", line 110, in parse for weibo in self.parse_weibo(response): File "H:\weibo-search-master\weibo\spiders\search.py", line 542, in parse_weibo print(clean_text) UnicodeEncodeError: 'gbk' codec can't encode character '\xb9' in position 35: illegal multibyte sequence

Thisisnotgoingpublished commented 9 months ago

忘记之前写的所有代码只是简简单单的把倒数第二行标记起来下下面填上跳过所有错误暂时万事大吉等待作者上线修一修

/*                print(weibo)              */
                try:
                    print(str(weibo))
                except UnicodeEncodeError as e:
                    print("Error occurred while encoding:", e)
                yield {'weibo': weibo, 'keyword': keyword}

dataabc commented 9 months ago

感谢热心反馈。我现在不方便调试，有时间会再调试下，感谢。

dataabc / weibo-search

表情在 gbk 无法识别 #440

忘记之前写的所有代码只是简简单单的把倒数第二行标记起来下下面填上跳过所有错误暂时万事大吉等待作者上线修一修

dataabc / weibo-search

表情 在 gbk 无法识别 #440

忘记之前写的所有代码 只是简简单单的把倒数第二行标记起来 下下面填上 跳过所有错误 暂时万事大吉 等待作者上线修一修

表情在 gbk 无法识别 #440

忘记之前写的所有代码只是简简单单的把倒数第二行标记起来下下面填上跳过所有错误暂时万事大吉等待作者上线修一修