felix-cao / Blog

A little progress a day makes you a big success!
30 stars 4 forks source link

Puppeteer 使用技巧 #185

Open felix-cao opened 4 years ago

felix-cao commented 4 years ago

一、配置类

1.1、创建一个浏览器对象

const browser = await puppeteer.launch({
    slowMo:500, // 输入延迟时间
    headless: false,
    devtools: false,
    defaultViewport: null,
    args: ['--window-size=1920,1080'],
  });

1.2、等待页面 dom 全部渲染完成后

const timeout = 90000;
const options = { timeout, waitUntil: 'networkidle0' };
await page.goto(`http://www.baidu.com`, options).catch(e => console.error(e));

二、监控类

2.1、监控页面 Dialog

await page.once('dialog', async dialog => {
    expect(dialog.message()).toBe(`upload successfully!`)
    await page.waitForTimeout(500);
    await dialog.accept();
  })

不要在循环中使用 page.on, 可以拆分或换为 page.once

2.2、获取 Response

常规

const firstResponse = await page.waitForResponse(
  'https://example.com/resource'
);
const finalResponse = await page.waitForResponse(
  (response) =>
    response.url() === 'https://example.com' && response.status() === 200
);
const finalResponse = await page.waitForResponse(async (response) => {
  return (await response.text()).includes('<html>');
});
return finalResponse.ok();

ajax 轮训

当响应的 body 数据中 body.SimulationProgress === 100 时返回

const finalResponse  = await page.waitForResponse( async (response) => {
  const body = JSON.parse(await response.text());
  console.log('---body---', body)
  return body.SimulationProgress === 100
})
console.log('--outjson--', await finalResponse.text());

三、页面输入设置

3.1、键盘输入效果

以百度搜索为例,在百度搜索输入框中输入:合肥创新产业园

await page.goto(`http://www.baidu.com`);
await page.focus("#kw");
await page.waitForTimeout(300);
await page.keyboard.type('合肥创新产业园', {delay: 300});
await page.click('#su', {delay: 300});

清空表单里原有内容

const kw = await page.$("#kw");
await kw.click({clickCount: 3});
await page.waitForTimeout(300);
await page.keyboard.type('合肥创新产业园', {delay: 300});
await page.click('#su', {delay: 300});

参考 How to delete existing text from input using Puppeteer?

3.2、设置下拉选择框

await page.select(selector, "2"); // 2 是页面中 select 下拉框中 `option` 的 `value`

3.3、设置和获取 localStorage

在使用 jest 进行但愿测试时,需要存储一些 global 级别的, 但 jest 提倡的是 mock 数据,所以在单个 test 文件里不支持对global 级别的变量进行 update, 使用页面级的 localStorage 是个非常不错的变通方案。

设置 localStrorage

await page.evaluate(() => {
  localStorage.setItem('token', 'example-token');
});

获取 localStorage

const localStorage = await page.evaluate(() => localStorage.getItem("token"));

3.4、传一个参数给页面执行

utils.ATS = { token: '', BankReports:[] }; // Auto Test Store
await page.evaluate(ATS => localStorage.setItem('ATS', ATS), JSON.stringify(utils.ATS));

3.5、操作(读取和设置) checkbox 复选框

如果没有选择,则点击,否则不点击

const selectorDOM = ``
const funding = await page.$(selectorDOM );
let isCheckBoxChecked = await (await funding.getProperty("checked")).jsonValue();
if(!isCheckBoxChecked) {
  await page.click(selectorDOM , {delay: 300});
}

所有的 checkbox 全选(没点击哦)

await page.$$eval("input[type='checkbox']", checks => checks.forEach(c => c.checked = true));

依次点击所有的 checkbox

await page.$$eval('input[type="checkbox"]', checkboxes => {
  checkboxes.forEach(chbox => chbox.click())
});

四、页面数据读取

4.1、获取DOM文本数据

const textStr =  await page.$eval(selector, el => el.innerText);

4.2、点击包含指定文本的DOM节点

方法一:使用page.$x

在百度首页,点击含有'更多'的 a DOM 节点

await page.goto(`http://www.baidu.com`);
const [ele] = await page.$x("//a[contains(text(), '更多')]");
ele && (await ele.click());
expect(1).toBe(1);

参考 How to click on element with text in Puppeteer

方法二:使用 page.evaluate

遍历DOM节点,比较 innerText 值

// const [ele] = await page.$x(`//span[contains(text(), '${humanCycle}')]`);
// ele && (await ele.click());
await page.evaluate( (humanCycle, cycleLi) => {
  document.querySelectorAll(cycleLi).forEach(item => (item.innerText.trim() === humanCycle) && item.click());
}, humanCycle, cycleLi)
await page.waitForTimeout(500);

4.3、获取 class 属性数据

const el = await page.$(selectorDOM);
className = await (await el.getProperty("className")).jsonValue();

参考 elementHandle.getProperty(propertyName)elementHandle.jsonValue()How do I get the ElementHandle's class name when using Puppeteer?

4.4、获取 Cookies 数据

获取当前域下的所有 cookie, 并把 cookie 放到 localStorage

utils.ATS = { token: '', cookies: {}, BankReports:{} },// Auto Test Store
global.baseUrl = `http://www.baidu.com`
const cookies = await page.cookies(global.baseUrl);
const { name = '', value = '' } = _.find(cookies, {name: '.AspNet.ApplicationCookie'}); // Get the cookie , name is .AspNet.ApplicationCookie
utils.ATS.cookies = { name, value };
await page.evaluate(ATS => localStorage.setItem('ATS', ATS), JSON.stringify(utils.ATS));

带 cookie 的 post 请求

  // -----1, Get the localStorage
  const cookies = await page.cookies(global.baseUrl);
  const { name = '', value = '' } = _.find(cookies, {name: '.Asp##Net.ApplicationCookie'}); // Get the cookie , name is .AspNet.ApplicationCookie

  const URL = `${rootUrl}/Report/${siteID}/${BankID}/${utils.getCycle(args[2])}`;
  const header = { headers: {'Content-Type': 'application/json', Cookie: `${name}=${value}`}};
  const params = { BankID, Asof: utils.getCycle(args[2], '-'),  RevisionID: 0, Status: 1, SuccessCount: 0, FailedCount: 0 };
  await axios.post(URL, JSON.stringify(params), header)
    .then(res => console.log('post in---', res))
    .catch(e => console.error(e));

4.5、获取单选框 radio 的值数据

const radioObj = await page.$(selector);
// await radioObj.click();
const isRadioSelected = await (await radioObj.getProperty("checked")).jsonValue();

4.6、获取页面 display 值

调用下面的函数

async isDOMVisible(page, selector) {
    return await page.evaluate(selector => {
      const e = document.querySelector(selector);
      if(!e) {
        return false;
      }

      const style = window.getComputedStyle(e);
      return style && style.display !== 'none' && style.visibility !== 'hidden' && style.opacity !== '0';
    }, selector);
  }

4.7、获取当前节点的下一个兄弟节点

使用 Element.nextElementSibling

const last = await page.$('.item:last-child');
const next = await page.evaluateHandle(el => el.nextElementSibling, last);

参考 Getting the sibling of an elementHandle in Puppeteer

五、其他

5.1、上传下载

上传

const elementHandle = await page.$(selector); // input selector
await elementHandle.uploadFile(filePath);

下载

const downloadPath = ''; // 设置文件保存路径
await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath}); 
await page.click(selector, {delay: 500});

批量下载

const DOWNDOM = 'span[id^="download"] > a'

await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath});
const nodels = await page.$$(DOWNDOM);
for(let i = 0; i < nodels.length; i++) {
  await nodels[i].click();
  await page.waitForTimeout(1500);
}

注意上面的代码不能使用 forEach

5.2、控制打开的新窗口

主要是操作 window.open() 后的对象

const targetUrl = 'https://www.baidu.com'
await page.click(selector);
const newTarget = await global.__BROWSER__.waitForTarget((target) => target.url() === targetUrl);
const newPage = await newTarget.page();
// next can operate the newPage;
newPage.close();

5.3、抓取页面js错误、请求错误

请参考 How do I capture browser errors

    const color = require('ansi-colors');
    const msgStr = `${color.red.inverse(' Page error ')}`;
    page.on('pageerror', err => utils.error(msgStr, err.toString()))
                    .on('requestfailed', req => utils.error('Request Failed to load', `${req.url()}  ${req.failure().errorText}`))
                    .on('console', async msg => {
                      const throwErrs = ['VIDEOJS', 'Failed to load resource'];
                      const isThrow = throwErrs.some(str => msg.text().indexOf(str) >= 0)
                      if(msg.type() !== 'error' || isThrow ) {
                        return ;
                      }
                      const args = await msg.args();
                      args.forEach( async (arg) => utils.error('Capture console error', arg._remoteObject))
                    });

Reference

felix-cao commented 2 years ago

四个从页面中提取数据的方法

page.$(selector)

查询单个节点

The method runs document.querySelector within the page. If no element matches the selector, the return value resolves to null.

page.$$(selector)

查询多个节点

The method runs document.querySelectorAll within the page. If no elements match the selector, the return value resolves to [].

page.$eval(selector, pageFunction[,...args])

提取单个节点

This method runs document.querySelector within the page and passes it as the first argument to pageFunction. If there's no element matching selector, the method throws an error.

If pageFunction returns a Promise, then page.$eval would wait for the promise to resolve and return its value.

Examples:

const searchValue = await page.$eval('#search', (el) => el.value);
const preloadHref = await page.$eval('link[rel=preload]', (el) => el.href);
const html = await page.$eval('.main-container', (e) => e.outerHTML);

evaluate, 求值的意思。

page.$$eval(selector, pageFunction[,...args])

提取多个节点

This method runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction.

If pageFunction returns a Promise, then page.$$eval would wait for the promise to resolve and return its value.

Examples:

const divCount = await page.$$eval('div', (divs) => divs.length);
const options = await page.$$eval('div > span.options', (options) =>
  options.map((option) => option.textContent)
);

在页面中执行外部函数

page.evaluate(pageFunction[, ...args])

If the function passed to the page.evaluate returns a Promise, then page.evaluate would wait for the promise to resolve and return its value.

If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined. DevTools Protocol also supports transferring some additional values that are not serializable by JSON: -0, NaN, Infinity, -Infinity, and bigint literals.

Passing arguments to pageFunction:

const result = await page.evaluate((x) => {
  return Promise.resolve(8 * x);
}, 7);
console.log(result); // prints "56"

A string can also be passed in instead of a function:

console.log(await page.evaluate('1 + 2')); // prints "3"
const x = 10;
console.log(await page.evaluate(`1 + ${x}`)); // prints "11"

ElementHandle instances can be passed as arguments to the page.evaluate:

const bodyHandle = await page.$('body');
const html = await page.evaluate((body) => body.innerHTML, bodyHandle);
await bodyHandle.dispose();