Boris-code / feapder

🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度
http://feapder.com
Other
2.88k stars 476 forks source link

怎么把数据保存为CSV文件? #216

Closed liuchangfu closed 1 year ago

liuchangfu commented 1 year ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

Boris-code commented 1 year ago

看下https://feapder.com/#/source_code/pipeline

liuchangfu commented 1 year ago

看下https://feapder.com/#/source_code/pipeline

看得不是很明白 ,那两个参数是怎么传过来的?能解释一下吗?

Boris-code commented 1 year ago

框架传过来的,你需要做的是

  1. setting里配置
    # 数据入库的pipeline,支持多个
    ITEM_PIPELINES = [
    "pipeline.Pipeline" # pipeline文件名.类名
    ]
  2. 实现pipeline
    
    from feapder.pipelines import BasePipeline
    from typing import Dict, List, Tuple

class Pipeline(BasePipeline): """ pipeline 是单线程的,批量保存数据的操作,不建议在这里写网络请求代码,如下载图片等 """

def save_items(self, table, items: List[Dict]) -> bool:
    """
    保存数据
    Args:
        table: 表名
        items: 数据,[{},{},...]

    Returns: 是否保存成功 True / False
             若False,不会将本批数据入到去重库,以便再次入库

    """

    这里保存文件

    return True
liuchangfu commented 1 year ago

研究出来了,谢谢。这个框架比那个scrapy好用多了。文档也写得很清晰,易学,容易上手。