Closed chimaoshu closed 2 years ago
提供一个例子:
https://raw.githubusercontent.com/chimaoshu/chimaoshu.github.io/master/iptest.txt
这是一个包含gbk编码字符与IP的txt,配置文件中index4项为:
"index4": "url:https://raw.githubusercontent.com/chimaoshu/chimaoshu.github.io/master/iptest.txt"
如果直接decode('utf8'),会导致
ERROR:root:'utf-8' codec can't decode byte 0xb8 in position 3: invalid start byte
ERROR:root:Fail to get ipv4 address!
decode('utf8', 'ignore')则会忽略gbk字符,并匹配到IP
Some websites that display IP use non-utf encoding like gbk encoding, which leads to errors when the script is decoding content from these websites. Those non-utf encoding characters like Chinese characters do not affect our job because we just want to match the IP address in the decoded text, so we can ignore those non-utf encoding characters.
有些显IP网站包含非utf编码字符,会导致脚本在decode的时候因为无法以utf解码而出错。而那些导致出错的中文字符不会影响IP的匹配,所以可以在解码时忽略。