smoya commented 2 years ago

OpenGraph Studio issue

Reason/Context

Thanks to the ?url=<url-of-file> and ?base64=<base64-encoded-doc> query param, Studio can load most of files (yes, not all of them, see https://github.com/asyncapi/studio/issues/127). I expect users will use that to share their AsyncAPI docs.

Whenever a link to Studio (with or without those query params) is pasted into social media (Twitter, Linkedin, Facebook, Slack...), the preview image is this one:

It is a great pic, however it says nothing about the file being shared.

What if we could dynamically generate the preview image based on the file being shared? For example, the title, description and some stats could be shown.

I created a POC based on https://github.com/vercel/og-image (deprecated atm), available in my fork (It's just a POC) which is a server that generates dynamic images for being used on Open Graph image meta tags. This works by generating a dynamic HTML, making an screenshot of it through headless Chromium, and serving the resulting image.

The server accepts a ?base64=<base64-encoded-doc> query param, and generates an image that contains the AsyncAPI doc Title, Description, number of servers, channels and messages.

Despite the horrible design, the service is able to generate the following:

Based on the following AsyncAPI doc:

See

```yaml asyncapi: '2.2.0' info: title: Account Service version: 1.0.0 description: This service is in charge of processing user signups channels: user/signedup: subscribe: message: $ref: '#/components/messages/UserSignedUp' components: messages: UserSignedUp: payload: type: object properties: displayName: type: string description: Name of the user email: type: string format: email description: Email of the user ``` [Open in Studio](https://studio.asyncapi.com/?base64=YXN5bmNhcGk6ICcyLjIuMCcKaW5mbzoKICB0aXRsZTogQWNjb3VudCBTZXJ2aWNlCiAgdmVyc2lvbjogMS4wLjAKICBkZXNjcmlwdGlvbjogVGhpcyBzZXJ2aWNlIGlzIGluIGNoYXJnZSBvZiBwcm9jZXNzaW5nIHVzZXIgc2lnbnVwcwpjaGFubmVsczoKICB1c2VyL3NpZ25lZHVwOgogICAgc3Vic2NyaWJlOgogICAgICBtZXNzYWdlOgogICAgICAgICRyZWY6ICcjL2NvbXBvbmVudHMvbWVzc2FnZXMvVXNlclNpZ25lZFVwJwpjb21wb25lbnRzOgogIG1lc3NhZ2VzOgogICAgVXNlclNpZ25lZFVwOgogICAgICBwYXlsb2FkOgogICAgICAgIHR5cGU6IG9iamVjdAogICAgICAgIHByb3BlcnRpZXM6CiAgICAgICAgICBkaXNwbGF5TmFtZToKICAgICAgICAgICAgdHlwZTogc3RyaW5nCiAgICAgICAgICAgIGRlc2NyaXB0aW9uOiBOYW1lIG9mIHRoZSB1c2VyCiAgICAgICAgICBlbWFpbDoKICAgICAgICAgICAgdHlwZTogc3RyaW5nCiAgICAgICAgICAgIGZvcm1hdDogZW1haWwKICAgICAgICAgICAgZGVzY3JpcHRpb246IEVtYWlsIG9mIHRoZSB1c2Vy)

Studio will need to modify the og:image tag so it points to this new service.

<meta property="og:image" content="http://<service-url>/*.png?theme=light&base64=YXN5bmNhcGk6ICcyLjIuMCcKaW5mbzoKICB0aXRsZTogQWNjb3VudCBTZXJ2aWNlCiAgdmVyc2lvbjogMS4wLjAKICBkZXNjcmlwdGlvbjogVGhpcyBzZXJ2aWNlIGlzIGluIGNoYXJnZSBvZiBwcm9jZXNzaW5nIHVzZXIgc2lnbnVwcwpjaGFubmVsczoKICB1c2VyL3NpZ25lZHVwOgogICAgc3Vic2NyaWJlOgogICAgICBtZXNzYWdlOgogICAgICAgICRyZWY6ICcjL2NvbXBvbmVudHMvbWVzc2FnZXMvVXNlclNpZ25lZFVwJwpjb21wb25lbnRzOgogIG1lc3NhZ2VzOgogICAgVXNlclNpZ25lZFVwOgogICAgICBwYXlsb2FkOgogICAgICAgIHR5cGU6IG9iamVjdAogICAgICAgIHByb3BlcnRpZXM6CiAgICAgICAgICBkaXNwbGF5TmFtZToKICAgICAgICAgICAgdHlwZTogc3RyaW5nCiAgICAgICAgICAgIGRlc2NyaXB0aW9uOiBOYW1lIG9mIHRoZSB1c2VyCiAgICAgICAgICBlbWFpbDoKICAgICAgICAgICAgdHlwZTogc3RyaW5nCiAgICAgICAgICAgIGZvcm1hdDogZW1haWwKICAgICAgICAgICAgZGVzY3JpcHRpb246IEVtYWlsIG9mIHRoZSB1c2Vy" />

The preview image would then look like (note that https://shaggy-stingray-56.loca.lt/ was a local tunnel to my localhost serving a simple html with the og-image tag):

By the way, all of this could run on serverless functions such as the Netlify functions (which are AWS Lambda) available in free tier :)

Description

Here is a sequence diagram showing the big picture of the flow a request made by an Open Graph crawler (crawlers used for querying the open graph image whenever you share a link) will follow:

sequenceDiagram
Open Graph Crawler->>+Studio: /?base64=<encoded_doc>
Studio->>Studio: Set og:title, and og:description metatags. Set og:image to <og-generator-service>/generate.png?title=foo&description=bar&operations=4&servers=2 
Studio->>-Open Graph Crawler: Pre-rendered Studio HTML webpage
Open Graph Crawler->>+OpenGraph Generator: <og-generator-service>/generate.png?title=foo&description=bar&operations=4&servers=2 
OpenGraph Generator->>-Open Graph Crawler: og-image.png

Note that, as explained in this comment, we would need to configure pre-rendering in Netlify for doing the og:image content URL replacement on each request made by a crawler.

Alternatively, whatever technology we use (for example NextJS), the flow for rendering the Studio page would be something like the following:

flowchart TD
    A[User] --> B(https://studio.asyncapi.com)
    B --> C{contains ?base64 or ?url}
    C -->|No| D[Static rendering]
    C -->|Yes| E[Dynamic rendering]
    E --> F(Parsing AsyncAPI doc + etc)

In case the image can't be generated due to whatever reason, the default AsyncAPI Studio should be served instead: https://studio.asyncapi.com/img/meta-studio-og-image.jpeg

What you will need to do

Note that the design of the Open Graph image card is also part of this task. Ask @Mayaleeeee for help on this (Thanks! 🙌 ).

Prerequisites

Fork Studio.
Deploy it to your own Netlify free account. I recommend you to do it via Netlify’s website UI and not via Netlify CLI. With few clicks your site will be configured to be deployed on each push to the branch you specify.
Enable Prerendering in your new Netlify site. This will allow web crawlers (such as the ones used for fetching the OpenGraph meta tags) receive a fully rendered version of the website, including content loaded by Javascript.

Work to do

Create a new Github repository where your Open Graph image generator service will be tracked.
Create then a new service that exposes an HTTP API that generates an Open Graph image based on few query params (use the names you want, the following are just suggestions):
1. doc_url: a URL pointing to a raw AsyncAPI document.
2. doc_base64: an AsyncAPI document encoded in base64.
Some hints:
- You will need to use the AsyncAPI Parser-JS to parse your document and extract the data you need from it
- In order to generate the image, you can use @vercel/og package. (og-image is deprecated now). Documentation on how to use it is available here. Alternatively, if that package is not compatible in a non-vercel world, you might want to take a look to https://github.com/vercel/satori, which is what that package uses under the hood.
Deploy this new service somewhere. I recommend you to deploy it via Netlify Functions. Or even better if we can get it as a Netlify Edge Function (support of npm packages is still experimental) since I believe we will be able to implement a caching mechanism easily.
Once we have a public URL of that service, include a Javascript code somewhere in the Studio website that modifies the og:image meta tag content to point to the new service URL including the doc_url or doc_base64 query param with the right content. That will be the trick that will make the OpenGraph image shown dynamically based on such parameters.
Performance is a must. Both serving the Open Graph tags + generating the image should not take more than few secs (~3), otherwise crawlers will timeout (for example, Slack's crawler timeouts at 5 secs)
Investigate about caching. If hosted as a Netlify function, I believe we could just trust in cached responses. See https://docs.netlify.com/platform/caching/#supported-cache-control-headers. Otherwise, we could give a try to Netlify Blobs and store each generated image using the base64 hash (or a reproducible and atomic hash) so every new request first check if that image is already generated and in the case it is, serve the blob directly (not 100% if this use case can be supported, but I guess it is).

Anyway, more investigation on how to implement the service should be taken, so please do not take my words here as the right way to do it as I didn’t spent time on it when I created this issue.

GSoC 2024

This issue got accepted as part of the GSoC 2024. @helios2003 is assigned as mentee.

We are using the following read-only Project board to track the current status of it's work: https://github.com/orgs/asyncapi/projects/49/views/1

Athul0491 commented 8 months ago

Oh, I see now it works slow, but it works. 👍 Thanks for sharing. Anyway, I encourage you to debug where the time is spent and see how can be improved.

I optimized my code and tested the hosted API on postman. On an average, it takes about 2.5 seconds for the response now. The image generation part of the code takes less than 0.01 seconds and the rest of the time is because of the parsing.

GiteshDewangan commented 8 months ago

@smoya Hello sir , https://github.com/asyncapi/studio/issues/224#top I started working on this...

tihom4537 commented 8 months ago

Greetings sir @smoya ,this project, (#224) seems very interesting to work on , I have been researching and working on it ,will surely come up with my proposal.

Athul0491 commented 8 months ago

It was mentioned in the application template that I could submit a draft proposal. How can I submit a draft proposal for this project? I wanted to get feedback regarding my proposal.

smoya commented 8 months ago

It was mentioned in the application template that I could submit a draft proposal. How can I submit a draft proposal for this project? I wanted to get feedback regarding my proposal.

EDIT: Hi @Athul0491. Please see my message below. 🚀

smoya commented 8 months ago

FYI, you all have the following GSoC Application Template in case you want to craft impressive proposals. You can submit a draft proposal early to get feedback and iterate early. Be sure to read Google's guide to writing a proposal.

You can share the proposal via DM (to me) in Slack, or rather sharing it here (depending if it is data sensitive or not).

smoya commented 7 months ago

Added a note about the possibility of using https://github.com/vercel/satori in case @vercel/og is not compatible with a non-vercel environment (I doubt it but just in case).

Alternatively, if that package is not compatible in a non-vercel world, you might want to take a look to https://github.com/vercel/satori, which is what that package uses under the hood.

smoya commented 7 months ago

Added a mention to the possibility of using cached responses when using Netlify functions as a first possible solution for caching.

If hosted as a Netlify function, I believe we could just trust in cached responses. See https://docs.netlify.com/platform/caching/#supported-cache-control-headers.

smoya commented 7 months ago

@RegretfulWinter are you finally applying? The deadline is today https://developers.google.com/open-source/gsoc/timeline#april_2_-_1800_utc

smoya commented 6 months ago

Just for the record, @helios2003 got selected as GSOC 2024 Mentee and it is working on this issue.

helios2003 commented 5 months ago

I have run some performance tests between the current production instance of studio-next and my deployed instance of studio, which parses the content at the studio level (see the issue description) to dynamically set the open graph tags. My deployed instance of studio can be found here: https://studio-helios2003.netlify.app.

Without `base64` document in the URL params

Metric	https://studio-next.netlify.app	https://studio-helios2003.netlify.app
Time to first byte	356 ms	345 ms
First Contentful Paint	606 ms	549 ms
Onload time	1400 ms	1300 ms
Largest Contentful Paint	2900 ms	2600 ms
Time to be Interactive	3200 ms	2900 ms
Fully loaded time	4100 ms	4100 ms

With `base64` document in the URL params

Metric	https://studio-next.netlify.app	https://studio-helios2003.netlify.app
Time to first byte	284 ms	2600 ms
First Contentful Paint	426 ms	2800 ms
Onload time	1200 ms	3500 ms
Largest Contentful Paint	4100 ms	6700 ms
Time to be Interactive	3200 ms	5800 ms
Fully loaded time	4200 ms	6700 ms

the base64 doc used can be found here: https://tinyurl.com/57dexzrd

KhudaDad414 commented 5 months ago

@helios2003 what was the approach here? parsing the document twice?

helios2003 commented 5 months ago

Nope, the document is being parsed once.

smoya commented 5 months ago

After checking @helios2003 tests, I ran a simple comparison test measuring the response time from https://studio.asyncapi.com and https://studio-next.netlify.app.

The Studio URL was always the same @helios2003 provided in its tests, which loads an AsyncAPI via the base64 query param.

Look at the results:

Version	Cache-Status	Time Total (seconds)
Studio (regular)	"Netlify Edge"; fwd=miss	0.316334
Studio-next (NextJS)	"Next.js"; hit,"Netlify Edge"; fwd=stale	4.156618

As you can see, the Next-JS version takes almost 4 seconds more than the regular version. I could understand the timing because almost everything is rendered at server level. However, I don't understand the Cache-Status response header then. It says there is a cache HIT. But how so? If it's a hit, I would expect the response to be a prebuilt one. In that case, the response would have to take much less than that.

Any idea why this is happening? @KhudaDad414 @Amzani

smoya commented 5 months ago

@helios2003 What about intercepting the request made by opengraph crawlers and print, in that case, only the headers? In that case, no extra javascript would be needed to be rendered. Then, I expect the response time should be way lower than 5s

helios2003 commented 5 months ago

@helios2003 What about intercepting the request made by opengraph crawlers and print, in that case, only the headers? In that case, no extra javascript would be needed to be rendered. Then, I expect the response time should be way lower than 5s

On doing that this is the result. The time taken is much larger for the first request but reduces substantially for the subsequent calls to the same endpoint. This contains the doc specified in the above message.

smoya commented 5 months ago

On doing that this is the result.

~5 seconds to parse the doc + print basic headers it's too much. Are you sure only required headers are being printed? (no headers loading scripts, etc)

The time taken is much larger for the first request but reduces substantially for the subsequent calls to the same endpoint.

That's due to the cache at Netlify's edge. If you print the cache-status response header, you will notice a hit.

smoya commented 5 months ago

Update: The diff regarding time response in Netlify VS Vercel is so noticeable. After the creation of https://github.com/asyncapi/studio/issues/1118, @helios2003 is gonna keep working on the main assigned task (this issue) and most probably keep deploying it's changes to both Netlify and Vercel to avoid unexpected performance issues.

Meanwhile, I expect the owners of Studio to prioritize the investigation.

cc @magicmatatjahu @KhudaDad414 @Amzani

smoya commented 4 months ago

@Mayaleeeee seems to be out until the first week of August (as per its absence in Slack and its Slack status message). I hope she can then work on providing the design for the OG card and be on time for the GSOC timeline.

cc @helios2003

asyncapi / studio

Open Graph link preview image according to the document to open #224

OpenGraph Studio issue

Reason/Context

Description

What you will need to do

Prerequisites

Work to do

GSoC 2024

Without `base64` document in the URL params

With `base64` document in the URL params

asyncapi / studio

Open Graph link preview image according to the document to open #224

OpenGraph Studio issue

Reason/Context

Description

What you will need to do

Prerequisites

Work to do

GSoC 2024

Without base64 document in the URL params

With base64 document in the URL params

Without `base64` document in the URL params

With `base64` document in the URL params