Boris-code / feapder

🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度
http://feapder.com
Other
2.94k stars 480 forks source link

playwright方式报错 #265

Open wangfan002 opened 2 weeks ago

wangfan002 commented 2 weeks ago

问题 运行示例test_playwright.py 报错playwright._impl._errors.Error: It looks like you are using Playwright Sync API inside the asyncio loop.\nPlease use the Async API instead.

python环境 conda python 3.10 feapder[render] 1.9.0

截图 image

代码

import time

from playwright.sync_api import Page

import feapder
from feapder.utils.webdriver import PlaywrightDriver

class TestPlaywright(feapder.AirSpider):
    __custom_setting__ = dict(
        RENDER_DOWNLOADER="feapder.network.downloader.PlaywrightDownloader",
    )

    def start_requests(self):
        yield feapder.Request("https://www.baidu.com", render=True)

    def parse(self, reqeust, response):
        driver: PlaywrightDriver = response.driver
        page: Page = driver.page

        page.type("#kw", "feapder")
        page.click("#su")
        page.wait_for_load_state("networkidle")
        time.sleep(1)

        html = page.content()
        response.text = html  # 使response加载最新的页面
        for data_container in response.xpath("//div[@class='c-container']"):
            print(data_container.xpath("string(.//h3)").extract_first())

if __name__ == "__main__":
    TestPlaywright(thread_count=1).run()
QingShan-Xu commented 1 week ago

你好, 解决了吗

QingShan-Xu commented 1 week ago

不知道是什么原因, 但执行 playwright install --with-deps 后可以跑

wangfan002 commented 1 week ago

你好, 解决了吗

还没有🥹

QingShan-Xu commented 1 week ago

你好, 解决了吗

还没有🥹

睡一觉起来也不行了

QingShan-Xu commented 1 week ago

你好, 解决了吗

还没有🥹

storage_state_path=None, # 保存浏览器状态的路径 我把这个设为None后, 不报错了

wangfan002 commented 1 week ago

你好, 解决了吗

还没有🥹

storage_state_path=None, # 保存浏览器状态的路径

我把这个设为None后, 不报错了

哇,这么厉害,我明天试试😎

wangfan002 commented 1 week ago

你好, 解决了吗

还没有🥹

storage_state_path=None, # 保存浏览器状态的路径 我把这个设为None后, 不报错了

这样吗,我测试还是报错 class TestPlaywright(feapder.AirSpider): __custom_setting__ = dict( RENDER_DOWNLOADER="feapder.network.downloader.PlaywrightDownloader", PLAYWRIGHT=dict( storage_state_path=None ) )

QingShan-Xu commented 1 week ago
__custom_setting__ = dict(
      PLAYWRIGHT=dict(
          user_agent=None,  # 字符串 或 无参函数,返回值为user_agent
          proxy=None,  # xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
          headless=False,  # 是否为无头浏览器
          driver_type="chromium",  # chromium、firefox、webkit
          args=["--lang=zh-CN"],
          timeout=60,  # 请求超时时间
          window_size=(1024, 800),  # 窗口大小
          executable_path=None,  # 浏览器路径,默认为默认路径
          download_path=None,  # 下载文件的路径
          render_time=0,  # 渲染时长,即打开网页等待指定时间后再获取源码
          wait_until="networkidle",
          use_stealth_js=True,  # 使用stealth.min.js隐藏浏览器特征
          storage_state_path=None,  # 保存浏览器状态的路径
          page_on_event_callback=dict(response=on_response),
          save_all=False,
      ),
  )

我是这样的, 没报错, wsl2大便环境

wangfan002 commented 1 week ago
__custom_setting__ = dict(
      PLAYWRIGHT=dict(
          user_agent=None,  # 字符串 或 无参函数,返回值为user_agent
          proxy=None,  # xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
          headless=False,  # 是否为无头浏览器
          driver_type="chromium",  # chromium、firefox、webkit
          args=["--lang=zh-CN"],
          timeout=60,  # 请求超时时间
          window_size=(1024, 800),  # 窗口大小
          executable_path=None,  # 浏览器路径,默认为默认路径
          download_path=None,  # 下载文件的路径
          render_time=0,  # 渲染时长,即打开网页等待指定时间后再获取源码
          wait_until="networkidle",
          use_stealth_js=True,  # 使用stealth.min.js隐藏浏览器特征
          storage_state_path=None,  # 保存浏览器状态的路径
          page_on_event_callback=dict(response=on_response),
          save_all=False,
      ),
  )

我是这样的, 没报错, wsl2大便环境 确实 按你的配置我也好了 3Q