cyrus-and / chrome-har-capturer

Capture HAR files from a Chrome instance
MIT License
535 stars 90 forks source link

Problems fetching content with Chrome 84.0.4147.125 #82

Closed p4fg closed 4 years ago

p4fg commented 4 years ago

Hi, thanks for a cool tool!

I think something has broken with newer versions of Chrome. Using Chrome Version 84.0.4147.125 and Chrome-har-capturer 0.13.7 on Ubuntu 20.04:

This works: nodejs cli.js "https://www.nasa.gov"

But this fails: nodejs cli.js --content "https://www.nasa.gov"

Output from the failing:

- https://www.nasa.gov/ ✗
  No data found for resource with given identifier
{
    "log": {
        "version": "1.2",
        "creator": {
            "name": "Chrome HAR Capturer",
            "version": "0.13.7",
            "comment": "https://github.com/cyrus-and/chrome-har-capturer"
        },
        "pages": [],
        "entries": []
    }
}

Running without foreground shows some differences between the runs: Using it with --content the window briefly pops up and closes almost immediately. Using it without content the window is open slightly longer.

p4fg commented 4 years ago

Debugging the problems further (by adding printouts on what requests are not fetched) gives some insight:

Most requests are captured fine, but for www.nasa.gov the following request is always failing: https://fonts.gstatic.com/s/titilliumweb/v8/NaPDcZTIAOhVxoMyOr9n_E7ffHjDGItzYw.woff2

This actual request is sent and a response is received (checked by tunneling all chrome-requests through a proxy).

The internal state of the request is:

{ requestParams:
   { requestId: '1066142.35',
     loaderId: '80151A20E13ED7581750FC0012116287',
     documentURL: 'https://www.nasa.gov/',
     request:
      { url:
         'https://fonts.gstatic.com/s/titilliumweb/v8/NaPDcZTIAOhVxoMyOr9n_E7ffHjDGItzYw.woff2',
        method: 'GET',
        headers: [Object],
        mixedContentType: 'none',
        initialPriority: 'VeryHigh',
        referrerPolicy: 'no-referrer-when-downgrade' },
     timestamp: 98817.846909,
     wallTime: 1597746684.032241,
     initiator:
      { type: 'parser',
        url:
         'https://fonts.googleapis.com/css?family=Titillium+Web:400,600,700' },
     type: 'Font',
     frameId: 'DF74E03844AED370EF4A6A0E482161B0',
     hasUserGesture: false },
  responseParams:
   { requestId: '1066142.35',
     loaderId: '80151A20E13ED7581750FC0012116287',
     timestamp: 98818.080681,
     type: 'Font',
     response:
      { url:
         'https://fonts.gstatic.com/s/titilliumweb/v8/NaPDcZTIAOhVxoMyOr9n_E7ffHjDGItzYw.woff2',
        status: 200,
        statusText: 'OK',
        headers: [Object],
        mimeType: 'font/woff2',
        connectionReused: false,
        connectionId: 10278,
        remoteIPAddress: '127.0.0.1',
        remotePort: 9090,
        fromDiskCache: false,
        fromServiceWorker: false,
        fromPrefetchCache: false,
        encodedDataLength: 613,
        timing: [Object],
        protocol: 'http/1.1',
        securityState: 'insecure',
        securityDetails: [Object] },
     frameId: 'DF74E03844AED370EF4A6A0E482161B0' },
  responseLength: 11720,
  encodedResponseLength: 12333,
  responseFinishedS: 98818.062151,
  responseBody: undefined,
  responseBodyIsBase64: undefined,
  newPriority: undefined }

Using another URL www.aftonbladet.se gives the following failed request: https://gfx.aftonbladet-cdn.se/abstrap/fonts/1.0.5/abicon/abicon.woff

Using www.whitehouse.gov gives failed requests for:

https://fonts.gstatic.com/s/sourcesanspro/v13/6xKydSBYKcSV-LCoeQqfX1RYOo3i54rwlxdu.woff2
https://fonts.gstatic.com/s/sourcesanspro/v13/6xKydSBYKcSV-LCoeQqfX1RYOo3ig4vwlxdu.woff2
https://fonts.gstatic.com/s/sourcesanspro/v13/6xK3dSBYKcSV-LCoeQqfX1RYOo3qOK7l.woff2
https://fonts.gstatic.com/s/merriweather/v21/u-440qyriQwlOrhSvowK_l5-fCZM.woff2
https://fonts.gstatic.com/s/merriweather/v21/u-4n0qyriQwlOrhSvowK_l52xwNZWMf6.woff2
https://fonts.gstatic.com/s/merriweather/v21/u-4m0qyriQwlOrhSvowK_l5-eRZOf-I.woff2

So it appears this either is caused by woff/woff2-files not being stored or something with fonts.gstatic.com. Will investigate a bit further and report back if i find something out.

p4fg commented 4 years ago

Seems to be .woff/.woff2-files in general even hosted on other sites.

cyrus-and commented 4 years ago

Thanks for the in-depth report. It really seems to be a problem with WOFF resources, this minimal website reproduces the issue: https://tecfa.unige.ch/guides/WOFF/woff-blisssym-example.html

Let me dig into this...

cyrus-and commented 4 years ago

OK it looks like that for whatever reason the content of WOFF files is not retrieved, even if you try to save the above URL via DevTools ("Save all as HAR with content") the entry is empty:

"content": {
  "size": 13664,
  "mimeType": "application/font-woff",
  "compression": 0
},

I guess I can simply make errors about fetching content non-fatal...

cyrus-and commented 4 years ago

Please give it a try, thanks.