dataabc / weiboSpider

新浪微博爬虫,用python爬取新浪微博数据
8.14k stars 1.95k forks source link

issues_feature_post_api_576 实现通过POST方式将数据推送到自定义接口 #577

Closed myshero closed 2 months ago

myshero commented 2 months ago

通过添加配置实现通过POST方式将数据推送到自定义接口

    "write_mode": ["post"],
    "post_config": {
        "api_url": "https://api.example.com/weibo/post",
        "api_token": ""
    }
myshero commented 2 months ago

可能需要添加在 ./weibo_spider/writer/post_writer.py 中添加:

from time import sleep  # 新增导入
from requests.exceptions import RequestException  # 新增导入

但我本地和阿里云服务器测试运行正常。

用的 Dockerfile

FROM python:3.10-slim

WORKDIR /app

RUN pip3 install --upgrade pip
RUN apt-get update

COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt

COPY . .

# 为了容器一直运行
RUN /bin/echo "Test log at $(date)" >> /tmp/test.log
CMD ["tail", "-f", "/tmp/test.log"]

docker-compose.yml

version: '3'

services:
  weibo:
    image: weibo_spider:latest
    container_name: weibo_spider_python
    user: "root:root"
    volumes:
      - ./weibo/:/app/weibo/
      - ./config.json:/app/config.json

测试命令:

docker exec -it weibo_spider_python bash -c "cd /app/ && python3 -m weibo_spider"
dataabc commented 2 months ago

感谢贡献代码,非常实用的新特性,不过,确实需要在./weibo_spider/writer/post_writer.py添加:

from time import sleep  # 新增导入
from requests.exceptions import RequestException  # 新增导入

修改后就可以merge了,感谢。

myshero commented 2 months ago

已经添加,并且优化了一下昨天提交的换行匹配。

./weibo_spider/parser/comment_parser.py

-  new_content = re.sub(r'\n+', '\n', new_content)

+  new_content = re.sub(r'\n+\s*', '\n', new_content)
dataabc commented 2 months ago

已merge,再次感谢。