asnunes / notion-page-to-html

NodeJS tool to convert public Notion pages to HTML from page ID
MIT License
161 stars 42 forks source link

Bug with uploaded images #15

Closed Soneji closed 3 years ago

Soneji commented 3 years ago

Hi I seem to be having an issue with uploaded images.

This is the URL I am trying to use with the module https://www.notion.so/dhavalsoneji/2c5dd1f8b26840d7ba882d1490a4a917

I get this error:

error - uncaughtException: SyntaxError: Unexpected token P in JSON at position 0
    at JSON.parse (<anonymous>)
    at IncomingMessage.<anonymous> (/Users/dhaval/git-clones/portfolio/node_modules/notion-page-to-html/dist/utils/usecases/http-get/node-http-get.js:72:44)
    at IncomingMessage.emit (node:events:406:35)
    at IncomingMessage.emit (node:domain:475:12)
    at endReadableNT (node:internal/streams/readable:1343:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)

Which is happening because stringData is trying to be JSON parsed, but its value is:

PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPEVycm9yPjxDb2RlPkFjY2Vzc0RlbmllZDwvQ29kZT48TWVzc2FnZT5BY2Nlc3MgRGVuaWVkPC9NZXNzYWdlPjxSZXF1ZXN0SWQ+WENKU0tNQzM5VEhQMTU5RjwvUmVxdWVzdElkPjxIb3N0SWQ+bjV2T3I4NEpURk5CWE5Ed1RKeWN4U0FocU5pUDJqSkd2U2dEcFl6ckcwQU5uQ1B6cUNUWHZLODJxZndFOE1OakwxOFFuNlpxSkU4PTwvSG9zdElkPjwvRXJyb3I+

Which b64 decodes to:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>AccessDenied</Code>
  <Message>Access Denied</Message>
  <RequestId>XCJSKMC39THP159F</RequestId>
  <HostId>n5vOr84JTFNBXNDwTJycxSAhqNiP2jJGvSgDpYzrG0ANnCPzqCTXvK82qfwE8MNjL18Qn6ZqJE8=</HostId>
</Error>

I believe this is happening due to an image I uploaded to notion for the page cover. The API by default doesn't give us a useful URL to get the image. It will give something like: https://s3-us-west-2.amazonaws.com/secure.notion-static.com/40b79211-1ae6-427f-8b3f-85216732792a/Untitled.png Which is inaccessible

I think a solution is to check if the image url contains notion's aws and use notion's image endpoint

if (image.includes("amazonaws.com") && image.includes("secure.notion-static.com")) {
    image = "https://www.notion.so/image/" + encodeURIComponent(image) + "?table=block&id=" + id
}

Where id is the ID of the page.

This should give something like: https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F40b79211-1ae6-427f-8b3f-85216732792a%2FUntitled.png?table=block&id=2c5dd1f8-b268-40d7-ba88-2d1490a4a917

Which is properly accessible

asnunes commented 3 years ago

Hi, @Soneji. Thank you for reporting the issue in detail!

I am going to try to reproduce this problem and analyze your PR #16.

asnunes commented 3 years ago

Fixed on v1.1.2 and PRs #16 and #17. @Soneji, here is your rendered HTML example after fixed conversion

image

Nice content by the way. I hope notion-page-to-html keeps helping people to publish contents like this.

Soneji commented 2 years ago

Thank you for the kind words! 🙌 And thank you for the amazing tool that makes this possible!!

(if you want to subscribe 👀 https://dhavalsoneji.com/blog )