adriancooney / puppeteer-heap-snapshot

API and CLI tool to fetch and query Chome DevTools heap snapshots.
MIT License
1.35k stars 68 forks source link

Unknown or unsupported object with type 'Location' #1

Open yohikofox opened 2 years ago

yohikofox commented 2 years ago


I am trying to capture hrefs from the website below :

with following code snippet :

const Puppeteer = require("puppeteer");
const { captureHeapSnapshot, findObjectsWithProperties } = require("puppeteer-heap-snapshot");

const start = async () => {
    const browser = await Puppeteer.launch();
    const page = await browser.newPage();

    await page.goto("");

    let heapSnapshot = await captureHeapSnapshot(await;

    console.log('heapSnapshot:', findObjectsWithProperties(heapSnapshot, ['href']));


I got this issue :

(node:38964) UnhandledPromiseRejectionWarning: Error: Unknown or unsupported object with type 'Location'
    at compileGraphNodeObject (C:\ws\white-label\code\test\pupeteer-heap-snapshot\node_modules\puppeteer-heap-snapshot\dist\cjs\src\build-object.js:75:19) 
    at buildObjectFromNodeId (C:\ws\white-label\code\test\pupeteer-heap-snapshot\node_modules\puppeteer-heap-snapshot\dist\cjs\src\build-object.js:34:12)  
    at C:\ws\white-label\code\test\pupeteer-heap-snapshot\node_modules\puppeteer-heap-snapshot\dist\cjs\src\query.js:16:57
    at (<anonymous>)
    at findObjectsWithProperties (C:\ws\white-label\code\test\pupeteer-heap-snapshot\node_modules\puppeteer-heap-snapshot\dist\cjs\src\query.js:14:20)     
    at start (C:\ws\white-label\code\test\pupeteer-heap-snapshot\index.js:16:34)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:38964) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see (rejection id: 1)
(node:38964) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Thanks for replies.

jmitchel3 commented 2 years ago

I had a look at the website -- it has Cloudflare running which protects from bots/scraping.

You might consider trying:

const Puppeteer = require("puppeteer");
const { captureHeapSnapshot, findObjectsWithProperties } = require("puppeteer-heap-snapshot");
const randomUseragent = require('random-useragent'); // npm install random-useragent

const start = async () => {
    const browser = await Puppeteer.launch();
    const page = await browser.newPage();
    const agent = randomUseragent.getRandom();
    await page.setUserAgent(`${agent}`);

    await page.goto("");

    let heapSnapshot = await captureHeapSnapshot(await;

    console.log('heapSnapshot:', findObjectsWithProperties(heapSnapshot, ['href']));

Remember that headless puppeteer will let the requested server (aka requested url) know that it's a headless version of Chrome. Servers and services like Cloudflare can block this very easily.

adriancooney commented 2 years ago

Sorry for the delay on the reply here - I haven't had much time to look into this issue. From what I can see is that puppeteer-heap-snapshot simply does not know how to de-serialize the Location type from the heap snapshot. I'm happy to accept PRs if anyone wants to tackle understanding the datatype and de-serializing it. It's surprising that whatever this data is has its own data type as opposed to a primitive string or object.

dotnetCarpenter commented 2 years ago

You mean the Location DOM object? It looks like src/build-object.ts does not handle any DOM objects, so I expect that it will fail on any DOM reference in JS. Vue/React apps are will not have this issue since they have a virtual DOM and can not* reference DOM elements directly.

Just call .toString() on any unknown object is safe. It will give you [object NAME]. If that is not useful then you have to have specific code for that object. You are also missing all of the JS objects. TypeArray, Temporal, Date, Map, Set etc (maybe I missed some here).

* it's a simplification...

Nedgeva commented 2 years ago

I agree w/ @dotnetCarpenter. I guess we can simply pass blacklisted object names to options to omit them from compiling matched graph nodes. I'm not sure but sounds like some sort of WebAPI/globals can be used as default list.

So after little tweak same code as above would give me proper result on "" (just some example of SPA which runs ontop of Gatsby):

  { children: 'privacy policy', href: '/privacy/' },
  { children: 'disclaimer', href: '/disclaimer/' },
  { children: 'Integrations', href: '/integrations/' },
  { children: 'Testimonials', href: '/testimonials/' },
  { children: 'Download', href: '/download/' },
  { children: 'Integrations', href: '/integrations/' },
  { children: 'Docs', href: '/docs/' },
  { children: 'Site Quality', href: '/site-quality/' },
  { children: 'Accessibility', href: '/accessibility/' },
    children: 'Accessibility Statement',
    href: '/accessibility-statement/'
  { children: 'Disclaimer', href: '/disclaimer/' },
  { children: 'Legal', href: '/legal/' },
  { children: 'Home', href: '/' },
  { children: 'All free tools', href: '/resources/' },
    children: 'Responsive design glossary',
    href: '/responsive-design-glossary/'
  { children: 'Create Polypane workspace', href: '/create-workspace/' },
  { children: 'Color contrast checker', href: '/color-contrast/' },
  { children: 'For Marketers', href: '/marketers/' },
  { children: 'For Agencies', href: '/agencies/' },
  { children: 'For QA', href: '/quality-assurance/' },
  { children: 'Pricing', href: '/pricing/' },
  { children: 'privacy policy', href: '/privacy/' },
  { children: 'Privacy', href: '/privacy/' },
  { children: 'disclaimer', href: '/disclaimer/' },
  /* more results */