NanmiCoder / MediaCrawler

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
Other
16.47k stars 5.26k forks source link

小红书现在爬不了了吗 #356

Closed oorsuioi closed 1 month ago

kunhai-88 commented 1 month ago

/api/sns/web/v1/feed"接口参数变了 image 需要添加 xsec_token,xsec_token 可以再搜索接口拿到

async def get_note_by_id(self, note_id: str, xsec_token: str) -> Dict: """ 获取笔记详情API Args: note_id:笔记ID

    Returns:

    """
    data = {"source_note_id": note_id, "extra": {"need_body_topic": "1"} ,"image_formats": ["jpg", "webp", "avif"], "xsec_source":"pc_search", "xsec_token": xsec_token}
    uri = "/api/sns/web/v1/feed"
    res = await self.post(uri, data)

    if res and res.get("items"):
        res_dict: Dict = res["items"][0]["note_card"]
        res_dict["note_id"] = note_id
        return res_dict
    utils.logger.error(f"[XiaoHongShuClient.get_note_by_id] get note empty and res:{res}")
    return dict()
NanmiCoder commented 1 month ago

fix it

laienliang commented 1 month ago

fix it

请问怎么解决的?这个code每次都不一样的

fruitswordman commented 1 month ago
async def fetch_creator_notes_detail(self, note_list: List[Dict]):
    """
    Concurrently obtain the specified post list and save the data
    """
    semaphore = asyncio.Semaphore(config.MAX_CONCURRENCY_NUM)
    task_list = [
        self.get_note_detail(
            note_id=post_item.get("id"),
            xsec_source=post_item.get("xsec_source"),
            xsec_token=post_item.get("xsec_token"),
            semaphore=semaphore
        )
        for post_item in note_list
    ]

    note_details = await asyncio.gather(*task_list)
    for note_detail in note_details:
        if note_detail:
            await xhs_store.update_xhs_note(note_detail)

此处 _note_id=postitem.get("id"), 是不是应该是 _note_id=post_item.get("noteid"),

新update后的代码运行会报 source_note_id not found的报错,我追踪到这里,尝试修改以后就能跑通了。麻烦作者大大check一下

NanmiCoder commented 1 month ago
async def fetch_creator_notes_detail(self, note_list: List[Dict]):
    """
    Concurrently obtain the specified post list and save the data
    """
    semaphore = asyncio.Semaphore(config.MAX_CONCURRENCY_NUM)
    task_list = [
        self.get_note_detail(
            note_id=post_item.get("id"),
            xsec_source=post_item.get("xsec_source"),
            xsec_token=post_item.get("xsec_token"),
            semaphore=semaphore
        )
        for post_item in note_list
    ]

    note_details = await asyncio.gather(*task_list)
    for note_detail in note_details:
        if note_detail:
            await xhs_store.update_xhs_note(note_detail)

此处 _note_id=postitem.get("id"), 是不是应该是 _note_id=post_item.get("noteid"),

新update后的代码运行会报 source_note_id not found的报错,我追踪到这里,尝试修改以后就能跑通了。麻烦作者大大check一下

确实有问题,你可以贡献代码,但是修复这个 bug 的时候,复制错了。

fruitswordman commented 1 month ago

360

提交了pull request啦大大😊

NanmiCoder commented 1 month ago

360 提交了pull request啦大大😊

已合,感谢贡献

Ang0426 commented 1 month ago

麻烦问一下xsec_token这个东西是怎么获取的,没太理解,多谢多谢