第四章（朴素贝叶斯）中 rss 订阅失效

AIkikaze commented 11 months ago

问题描述

在第四章-朴素贝叶斯算法的第三个小实验中，使用了 feedparser 模块来解析两个 rss 源以获取文本数据。验证发现连接已经失效，所获取的文本列表为空。

点击网站连接，会看到如下内容

Your request has been blocked.

If you have questions, please contact us.

问题资源地址

第四章-朴素贝叶斯算法

问题位置截图

bayes_issue

自测代码

def localWords(feed1, feed0):
    docList = []
    classList = []
    fullText = []
    minLen = min(len(feed1["entries"]), len(feed0["entries"]))

    # 1. 文本获取与统计
    for i in range(minLen):
        # 类别 1：每次访问一条 RSS 源
        wordList = textParse(feed1["entries"][i]["summary"])
        docList.append(wordList)
        fullText.extend(wordList)
        classList.append(1)
        # 类别 0：每次访问一条 RSS 源
        wordList = textParse(feed0["entries"][i]["summary"])
        docList.append(wordList)
        fullText.extend(wordList)
        classList.append(0)
    vocabList = bayes.createVocabList(docList)
    top30Words = calMostFreq(vocabList, fullText)

    print(f"打印获取的文本:\n{docList}")
    print(f"打印单词列表:\n{vocabList}")

if __name__ == "__main__":
    import feedparser as fp # type: ignore
    ny = fp.parse('http://newyork.craigslist.org/stp/index.rss')
    sf = fp.parse('http://sfbay.craigslist.org/stp/index.rss')
    localWords(ny, sf)

输出结果

(py38) D:\PROJECT\ml>C:/tools/Anaconda3/envs/py38/python.exe d:/PROJECT/ml/4_bayes/rss.py
打印获取的文本:
[]
打印单词列表:
[]

建议

更换新的可用源
或者仅展示实验结果，让大家自己找源来测试算法

jiangzhonglian commented 11 months ago

可以参考这个来提问： https://github.com/apachecn/ailearning/issues/649

jiangzhonglian commented 11 months ago

别纠结，直接跳过，这个不影响学习！

apachecn / ailearning

第四章（朴素贝叶斯）中 rss 订阅失效 #648

问题描述

问题资源地址

问题位置截图

自测代码

输出结果

建议