asnunes / notion-page-to-html

NodeJS tool to convert public Notion pages to HTML from page ID
MIT License
163 stars 45 forks source link

Content Fetch Broken ; Issue with JSON Parsing #40

Open ltrademark opened 1 month ago

ltrademark commented 1 month ago

Error Output:

undefined:1

[Base64 string that resolves into an image from the Notion page. this base64 string does start with "R" but its valid.]
^

SyntaxError: Unexpected token R in JSON at position 0

    at JSON.parse (<anonymous>)
    at IncomingMessage.<anonymous> (/rbd/pnpm-volume/5cc784b6-0d5a-4458-b4c1-1c03663539f5/node_modules/notion-page-to-html/dist/utils/use-cases/http-get/node-http-get.js:74:44)
    at IncomingMessage.emit (node:events:538:35)
    at endReadableNT (node:internal/streams/readable:1345:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)

Location:

https://github.com/asnunes/notion-page-to-html/blob/0bd3d1d0c9ff1f11188d64152b093aa9f9bd1be8/src/utils/use-cases/http-get/node-http-get.ts#L33

Explanation & Reproduction:

This error happens when I try to run await NotionPageToHtml.convert(); to fetch data. It seems to happen regardless of including any of the { title, icon, cover, html } values; omitting some, or only using one of those values gave me the same result. Whats strange is that it was working before without issue, and it had suddenly stopped working. This potentially could be a change Notion made that is breaking the way this package runs/renders content from Notion.

As for Reproducing this error, all I did was call a const { title, icon, cover, html } = await NotionPageToHtml.convert([Notion page URL with ID]), and placing console logs before and after this line. Only the log before is triggered, then the error happens and the code exits immediately.

I can try rebuilding node-modules just in case its a node issue, but i see very little resolve in finding a solution of my projects end. If anyone else is having a similar issue, it would be of great help to hear how you may have resolved said issue~

bahrudint commented 1 month ago

Same issue here: `undefined:1 Rm91bmQuIFJlZGlyZWN0aW5nIHRvIGh0dHBzOi8vaW1nLm5vdGlvbnVzZXJjb250ZW50LmNvbS9zMy9wcm9kLWZpbGVzLXNlY3VyZSUyRjdiMTZjYmRiLTg5M2ItNDEzNC05MzQ5LTJhYjRmNjM1ZmRmYSUyRmQ5MjI4ZjY2LTNhZTYtNDljMS1iYjMwLWQyMjAxOWZjN2Y2MCUyRmhlcGVrdjIucG5nL3NpemUvP2V4cD0xNzI2OTc2MTQzJnNpZz1zTzFUOU1nY2hCNmJERXl3TFBiYWFnakNqUFl6S0N3ZVAtdjRmNFh4Tk9r ^

SyntaxError: Unexpected token 'R', "Rm91bmQuIF"... is not valid JSON at JSON.parse () at IncomingMessage. (/home/runner/workspace/node_modules/notion-page-to-html/dist/utils/use-cases/http-get/node-http-get.js:74:44)`

VanDerLars commented 1 month ago

I logged res

It seems that Notion changed the way how they deliver images. I don't know how it was previously.

The thing is, die URL in res.headers → location is still valid. (this is one example image from our Notion page)

But res.status is 302 and res.stringData is undefined

HTTP 302 means "found" → the image is there, but stringData seems to be removed

res.status: 302
res.stringData: undefined
res.headers: {
  date: 'Fri, 27 Sep 2024 09:59:23 GMT',
  'content-type': 'text/plain; charset=utf-8',
  'content-length': '264',
  connection: 'close',
  location: 'https://img.notionusercontent.com/s3/prod-files-secure%2F2c40536c-9db1-438a-93f5-d0c2f3308d68%2F6124774b-2370-4cb6-90f4-c9008c99a9d1%2FCleanShot_2024-08-19_at_17.01.002x.png/size/?exp=1727517564&sig=dqGCeLbZUFzPwFTNBJc-4JX4n1v97fisNr2v1629_AA',
  'cf-ray': '8c9aa403cdbdd368-FRA',
  'cf-cache-status': 'DYNAMIC',
  'cache-control': 'public,max-age=86340,immutable',
  'set-cookie': [
    'notion_browser_id=b9e166bf-d52a-4fe6-abc1-066223f7f577; Domain=www.notion.so; Path=/; Expires=Sat, 27 Sep 2025 09:59:23 GMT; Secure',
    'notion_check_cookie_consent=true; Domain=www.notion.so; Path=/; Expires=Sat, 28 Sep 2024 09:59:23 GMT; Secure',
    '__cf_bm=ujBjdNF1GXX1FFJZfZArcBZo3bccBvZI4ggjdjypOdU-1727431163-1.0.1.1-w_PXGp2e_K7s6EvDnC9RVF59LY.L11aJkCFjR4Q16kslthrIewlZ21Iif7Dk.FMWnZYOESOo7oosY7hvPnO8gg; path=/; expires=Fri, 27-Sep-24 10:29:23 GMT; domain=.notion.so; HttpOnly; Secure',
    '_cfuvid=Khgv069K7JKy07cf22LJf_E2tQnWeQS312i0XGBOaOw-1727431163693-0.0.1.1-604800000; path=/; domain=.notion.so; HttpOnly; Secure; SameSite=None'
  ],
  'strict-transport-security': 'max-age=31536000; includeSubDomains; preload',
  vary: 'Accept, Accept-Encoding',
  'content-security-policy': "script-src 'self' 'unsafe-inline' 'unsafe-eval' https://gist.github.com https://apis.google.com https://cdn.amplitude.com https://api.amplitude.com https://dev-embed.notion.co https://embed.notion.co https://static.zdassets.com https://api.smooch.io\t https://solve-widget.forethought.ai https://decagon.ai https://logs-01.loggly.com https://http-inputs-notion.splunkcloud.com https://cdn.segment.com https://analytics.pgncs.notion.so https://*.sentry.io https://checkout.stripe.com https://js.stripe.com https://embed.typeform.com https://admin.typeform.com https://public.profitwell.com https://static.profitwell.com https://js.sentry-cdn.com https://js.chilipiper.com https://platform.twitter.com https://cdn.syndication.twimg.com https://accounts.google.com https://vimeo.com https://player.vimeo.com https://youtube.com https://www.youtube.com https://www.googletagmanager.com https://www.googleadservices.com https://googleads.g.doubleclick.net https://cdn.metadata.io https://platformapi.metadata.io https://api-gw.metadata.io https://d2hrivdxn8ekm8.cloudfront.net https://d1lu3pmaz2ilpx.cloudfront.net https://dvqigh9b7wa32.cloudfront.net https://d330aiyvva2oww.cloudfront.net https://transcend-cdn.com https://cdn01.boxcdn.net https://cdn.sprig.com https://assets.customer.io https://code.gist.build https://www.google.com https://www.gstatic.com https://challenges.cloudflare.com;connect-src 'self' data: blob: https://img.notionusercontent.com https://cdn.amplitude.com https://api.amplitude.com https://www.notion.so https://api.embed.ly https://dev-embed.notion.co https://embed.notion.co https://ekr.zdassets.com https://ekr.zendesk.com\t https://makenotion.zendesk.com\t https://api.smooch.io\t wss://api.smooch.io\t https://api.forethought.ai https://logs-01.loggly.com https://http-inputs-notion.splunkcloud.com https://cdn.segment.com https://api.segment.io https://analytics.pgncs.notion.so https://api.pgncs.notion.so https://*.sentry.io https://checkout.stripe.com https://js.stripe.com https://cdn.contentful.com https://preview.contentful.com https://images.ctfassets.net https://www2.profitwell.com https://tracking.chilipiper.com https://api.chilipiper.com https://api.unsplash.com https://api.giphy.com/ https://giphy-analytics.giphy.com/ https://media0.giphy.com/ https://media1.giphy.com/ https://media2.giphy.com/ https://media3.giphy.com/ https://media4.giphy.com/ https://media5.giphy.com/ https://media6.giphy.com/ https://media7.giphy.com/ https://media8.giphy.com/ https://media9.giphy.com/ https://media10.giphy.com/ https://boards-api.greenhouse.io https://accounts.google.com https://oauth2.googleapis.com https://vimeo.com https://player.vimeo.com https://youtube.com https://www.youtube.com https://www.googletagmanager.com https://analytics.google.com https://www.googleadservices.com https://googleads.g.doubleclick.net https://region1.google-analytics.com https://region1.analytics.google.com https://www.google-analytics.com https://cdn.metadata.io https://platformapi.metadata.io https://api-gw.metadata.io https://d2hrivdxn8ekm8.cloudfront.net https://d1lu3pmaz2ilpx.cloudfront.net https://dvqigh9b7wa32.cloudfront.net https://d330aiyvva2oww.cloudfront.net https://transcend-cdn.com https://telemetry.transcend.io https://api.statuspage.io https://pgncd.notion.so https://api.statsig.com https://statsigapi.net https://exp.notion.so https://api.box.com https://*.mux.com https://api.sprig.com https://storage.googleapis.com https://cdn.sprig.com https://cdn.userleap.com https://track.customer.io https://*.api.gist.build https://*.cloud.gist.build https://api.palette.dev wss://msgstore.www.notion.so https://msgstore.www.notion.so https://audioprocessor.www.notion.so wss://audioprocessor.www.notion.so ws://localhost:* ws://127.0.0.1:* https://prod-files-secure.s3.us-west-2.amazonaws.com https://notion-emojis.s3-us-west-2.amazonaws.com https://s3-us-west-2.amazonaws.com https://s3.us-west-2.amazonaws.com https://notion-production-snapshots-2.s3.us-west-2.amazonaws.com https://file.notion.so notion://file.notion.so https://www.notion.com;font-src 'self' data: https://cdnjs.cloudflare.com https://cdn01.boxcdn.net;img-src 'self' data: blob: https: https://img.notionusercontent.com https://images.ctfassets.net https://platform.twitter.com https://syndication.twitter.com https://pbs.twimg.com https://ton.twimg.com https://region1.google-analytics.com https://region1.analytics.google.com https://*.mux.com https://track.customer.io https://file.notion.so notion://file.notion.so;style-src 'self' 'unsafe-inline' https://cdnjs.cloudflare.com https://github.githubassets.com https://js.chilipiper.com https://platform.twitter.com https://ton.twimg.com https://accounts.google.com https://transcend-cdn.com https://cdn01.boxcdn.net https://code.gist.build;frame-ancestors 'self';worker-src 'self' blob:;child-src 'self' blob:;media-src blob: https: http: https://*.mux.com https://file.notion.so notion://file.notion.so;frame-src https: http: https://accounts.google.com https://renderer.gist.build https://code.gist.build https://challenges.cloudflare.com https://identity.notion.so",
  'document-policy': 'js-profiling',
  'referrer-policy': 'strict-origin-when-cross-origin',
  'server-timing': 'r;dur=717',
  'x-content-type-options': 'nosniff',
  'x-dns-prefetch-control': 'off',
  'x-download-options': 'noopen',
  'x-frame-options': 'SAMEORIGIN',
  'x-notion-image-debug': '1727431163611,86400',
  'x-notion-request-id': '12d50d29-85c1-4965-814b-021bc6728855',
  'x-permitted-cross-domain-policies': 'none',
  'x-xss-protection': '0',
  server: 'cloudflare'
}
VanDerLars commented 1 month ago

Idea of solving the problem:

VanDerLars commented 1 month ago

Okay, that is the solution. The data is now just the header and not in stringData anymore. Notion changed how they hand over image URLs.

the fix: I created a pull request @asnunes. Could you please merge it and recreate the module please?

the pull request https://github.com/asnunes/notion-page-to-html/pull/41

cornzz commented 1 month ago

I believe notion changed the domain of their user content servers so images are not located atwww.notion.so/img anymore, but img.notionusercontent.com, which the original url redirects to now. The right fix would probably be to do another request to the url in the location header, recursively.

cornzz commented 1 month ago

Fix here --> https://github.com/asnunes/notion-page-to-html/pull/42