apachecn / ailearning

AiLearning:数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2
http://ailearning.apachecn.org/
Other
39.67k stars 11.46k forks source link

第四章(朴素贝叶斯)中 rss 订阅失效 #648

Closed AIkikaze closed 11 months ago

AIkikaze commented 11 months ago

问题描述

在第四章-朴素贝叶斯算法的第三个小实验中,使用了 feedparser 模块来解析两个 rss 源以获取文本数据。验证发现连接已经失效,所获取的文本列表为空。

点击网站连接,会看到如下内容

Your request has been blocked.

If you have questions, please contact us.

问题资源地址

第四章-朴素贝叶斯算法

问题位置截图

bayes_issue

自测代码

def localWords(feed1, feed0):
    docList = []
    classList = []
    fullText = []
    minLen = min(len(feed1["entries"]), len(feed0["entries"]))

    # 1. 文本获取与统计
    for i in range(minLen):
        # 类别 1:每次访问一条 RSS 源
        wordList = textParse(feed1["entries"][i]["summary"])
        docList.append(wordList)
        fullText.extend(wordList)
        classList.append(1)
        # 类别 0:每次访问一条 RSS 源
        wordList = textParse(feed0["entries"][i]["summary"])
        docList.append(wordList)
        fullText.extend(wordList)
        classList.append(0)
    vocabList = bayes.createVocabList(docList)
    top30Words = calMostFreq(vocabList, fullText)

    print(f"打印获取的文本:\n{docList}")
    print(f"打印单词列表:\n{vocabList}")

if __name__ == "__main__":
    import feedparser as fp # type: ignore
    ny = fp.parse('http://newyork.craigslist.org/stp/index.rss')
    sf = fp.parse('http://sfbay.craigslist.org/stp/index.rss')
    localWords(ny, sf)

输出结果

(py38) D:\PROJECT\ml>C:/tools/Anaconda3/envs/py38/python.exe d:/PROJECT/ml/4_bayes/rss.py
打印获取的文本:
[]
打印单词列表:
[]

建议

  1. 更换新的可用源
  2. 或者仅展示实验结果,让大家自己找源来测试算法
jiangzhonglian commented 11 months ago

可以参考这个来提问: https://github.com/apachecn/ailearning/issues/649

jiangzhonglian commented 11 months ago

别纠结,直接跳过,这个不影响学习!