Azure / static-web-apps

Azure Static Web Apps. For bugs and feature requests, please create an issue in this repo. For community discussions, latest updates, kindly refer to the Discussions Tab. To know what's new in Static Web Apps, visit https://aka.ms/swa/ThisMonth
https://aka.ms/swa
MIT License
318 stars 53 forks source link

Prerender for crawlers feature request #196

Open Swimburger opened 3 years ago

Swimburger commented 3 years ago

Many static web apps (JS, Blazor WASM, etc.) require pre-rendering to be more SEO friendly. Google crawler specifically handles JS apps quite well, but the crawler blocks DLL's so Blazor WASM isn't loaded properly. Many other search engine crawlers don't execute the JavaScript at all.

There are ways to serve pre-rendered content when a crawler is detected, but this requires a backend or webserver supporting rewriting the request. It would be a powerful selling point for Azure Static Web Apps to natively support pre-rendering of pages and serving them to crawlers.

The Chrome org has this project called Rendertron which renders webpages including executing JavaScript (works with Blazor WASM) and returns a static rendered version (without JS). Integrating something like that or similar tech could add a lot of value IMO.

Alternatively, it would be great to extend the SWA routing system to allow rewriting the request for crawlers.

miwebst commented 3 years ago

Thanks for the suggestion! This is certainly something for us to look into. I feel like allowing the SWA routing to route based on crawler (user agent) wouldn't be too bad and would unblock this scenario.

There are some changes in progress for making our routing more robust and add a number of new scenarios, I'll add this to our backlog for exploration!

Taras-Tyrsa commented 3 years ago

@anthonychu Hi. Any plans for implementing this feature? Is there a workaround how to redirect crawler to my server for server side rendering?

anthonychu commented 3 years ago

@Taras-Tyrsa @Swimburger If we were to allow user agent based routing, what would identify crawlers?

Taras-Tyrsa commented 3 years ago

@anthonychu There are documented lists of user agents for Bing, Google, Yahoo etc, but I believe that's not something Azure team should maintain and keep in sync. Moreover, crawlers are not the only use case. The best would be to allow developers to configure redirects based on user agent (regexp?), client IP/range, request url, headers, etc somewhere in Azure portal or better even via some config file in deployment.

Swimburger commented 3 years ago

My use case is focused on SEO, so whatever would help me achieve Google's JavaScript SEO best practices described here: https://developers.google.com/search/docs/guides/dynamic-rendering

They are using User Agents to detect crawlers, so we should at least be able to do that.

smakinson commented 2 years ago

SEO & I also see opengraph related url sharing need prerender.

I believe Netlify is the main comparison: https://docs.netlify.com/site-deploys/post-processing/prerendering/#app

If Azure were to pull it off, you would be way ahead of Amplify (opened in 2019): https://github.com/aws-amplify/amplify-console/issues/91

Saj1234 commented 2 years ago

This will be a deal breaker if Azure static web apps can support pre-render for crawlers. Most of the client rendered JavaScript apps written in React, Angular, Vue and etc. have the same issue with SEO/crawlers/open graph. I have seen various hacks, custom redirects and etc. to overcome this but nothing solid out there.

Frameworks like NextJs still need server-side rendering and some support with pre-render pages at build time or ISR. Still, in most cases, you don't want to encapsulate/rewrite your app in third-party frameworks for this. Also not the best option if your site has a lot of dynamic rendering components and if you want to host your site as a serverless static site.

Would be great if Azure static web apps can offer this feature. 👍

anthonychu commented 2 years ago

Sounds like there are some different requests here. Let me list them out and let me know what I've missed.

smakinson commented 2 years ago

@anthonychu For the first bullet point (and maybe second), Google is probably less of an issue. I have seen a need for prerender when opengraph images, titles, etc are based on an api call and facebook, etc does not give it a chance to load and fails to show the image, etc. when a url is shared. I'm just looking at azure recently, so I apologize if I have missed out on some possibilities. If it is possible to use something like: https://docs.prerender.io/article/12-middlewares with azure static web apps, that will likely work for me, but ideally it would be nice to simply press a button to enable prerender and forget about it.

smakinson commented 2 years ago

This may be helpful: https://github.com/netlify/prerender to see what netlify is doing.

Saj1234 commented 2 years ago

@anthonychu

  • If a request is from a crawler (specific user agents), do something else (call a function?)

Guessing this will be a server-side redirect similar to staticwebapp.config.json redirects? Perhaps an option in the routes section to specify if crawlers. Client redirects will get treated as Cloaking which is why this would be cool if supported.

smakinson commented 2 years ago

@anthonychu

Is it possible to configure a rewrite similar to this in staticwebapp.config.json to use prerender.io?

https://www.formition.com/blog/prerender-with-azure

exclude images, etc matches either the user agent or _escapedfragment and send a match over to https://service.prerender.io

For single page apps, I imagine this would happen under navigationFallback. The docs show we can use exclude, but can we filter on user agent or query string params, pass the token header and create the URL similar to https://service.prerender.io/https://{HTTP_HOST}{REQUEST_URI}

For reference: https://prerender.io/how-to-install-prerender/

Hmm 🤔 , or perhaps it would be better to do something along the lines of (though I'm looking at nodejs): https://anthonychu.ca/post/azure-functions-serve-html/

mix it with navigationFallback and either return the normal single page app index.html or get the prerender from prerender.io. How much would this affect response time?

martinmogusu commented 2 years ago

I support this feature request. It would be great to allow prerendering on Azure static Web apps.

This would for example allow one to host a blazor webassembly app on Azure static Web apps, while still a having the prerendering enabled so that one doesn't have to use ASP.NET Core hosting to achieve the same functionality.

mentorfloat commented 2 years ago
  • If a request is from a crawler (specific user agents), do something else (call a function?)

Even if it is just to update page <title> (title is now possible in .Net6 I believe) and <meta name="description"> that would be such a great leap already.

There are hacks around to make this work, some great blog posts around, but I feel these are not bulletproof to suit all cases. An official prerendering support by Azure Static Web App would be fantastic.

smakinson commented 2 years ago

@anthonychu When calling an api function from a route in staticwebapp.config.json which will return html, is it possible to get the original response headers that would have been sent if it had not called an api function and then modify them as needed for the api function html response? I'm interested in nodejs api functions in particular.

jonlighthill commented 2 years ago

@smakinson I noticed you utilize a prerender function on #519 here https://github.com/Azure/static-web-apps/issues/519#issuecomment-976510735. Are you able to share what you did for the function? I am trying to use prerender as well.

Thanks in advance.

smakinson commented 2 years ago

@jonlighthill I was planning to adjust the prerender express middleware for azure functions (https://github.com/prerender/prerender-node) and run all static web app requests through a function that decided if it should use the prerender or not. But what I ended up doing instead is using a cloudflare worker: https://github.com/prerender/prerender-cloudflare-worker with no change needed on the Azure end.

jonlighthill commented 2 years ago

@smakinson Thanks for the response. @anthonychu any progress/updates on this feature?

therealmarkber commented 2 years ago

Since this has already been achieved by IIS ARR https://docs.prerender.io/docs/18-integrate-with-iis, is it possible to achieve something similar just within re-write rules in staticwebapp.config.json instead of having to roll separate middleware?

SimonDania commented 1 year ago

I would be very excited as well, if this was possible. @anthonychu Any update on how far you are with this feature? I'd love to implement it on my website.

foconnor-DS commented 1 year ago

I think the ability to rewrite the url based on HTTP_USER_AGENT or QUERY_STRING is all that's necessary to achieve these requirements.

Right now you could rewrite the URL to bounce to a function, test the HTTP_USER_AGENT for botness, and handle the prerender scenario and/or return open graph meta tags. But if it's not a bot, you'd have to client redirect to another route that skips the function check. Pretty sure this is considered cloaking, and in some cases they redirect isn't honored.

Ideally you would only rewrite the route if it's a bot.

Edit: As a temporary workaround, you could redirect to Azure Function, test for botness and then handle your prerender/head-meta and then use something like YARP to suck in the static site. I guess in the case of static JS sites, you would only need to respond with a modified index.html so no real need for a full reverse proxy system. You can just HTTPClient the file, or grab it with blob API.

StephenEhlers commented 1 year ago

This is still badly needed, I am setting up a new site and dealing with a lot of deficiencies in trying to get SEO running properly on a Blazor SPA hosted through azure static webapps. Google says it runs 2 bot passes, one thats a raw html indexing and another later and more infrequent that loads and executes javascript, however, I have yet to see it pick up my homepage correctly. Additionally, unless I am not finding the latest documentation, Bing has the same issue as stated here it does not always run javascript like a browser against a webpage https://blogs.bing.com/webmaster/october-2018/bingbot-Series-JavaScript,-Dynamic-Rendering,-and-Cloaking-Oh-My

Agree completely with other comments here that something should be added to staticwebapp.config.json that allows for a botNavigationFallback or userAgentNavigationFallback setting to set an alternative root path to serve up files from based on user agent. This way the Azure Static Webapp could serve up alternative pre-rendered files based on the route names. Something like this would be really slick:

{ "navigationFallback": { // main spa page for human users "rewrite": "/index.html" }, "userAgentNavigationFallback": { // list of user agents to re-route "targetUserAgents": ["GoogleBot", "Bingbot", "DuckDuckBot"], // path to pre-rendered html files. "prerender_path": "/pre/" // tells the site to load from an html file with the same name (ex: /about would serve wwwroot/pre/about.html) "isRouteStaticHtml": "true" } }

I'm sure there is a more elegant way to do this, just putting these pieces in here to illustrate the issue/need. Would love to be involved in a beta program and try this feature out or give more input if you guys are seriously looking into use case requirements for this feature.

KennyNuggets commented 1 year ago

Is there any update on this? Specifically if static web app route can detect bots?

jonlighthill commented 1 year ago

Is there any update on this? Specifically if static web app route can detect bots?

Hasn't been an update for like 2 years. Where the stale bot at?

foconnor-DS commented 1 year ago

Yup. This is still difficult to do. Sometimes I stand up an app service just to handle opengraph tags, then rewrite the index.html. So they hit my app service iis just to load a spa index hosted on azure swa.

But OG is now more popular, what with Teams, sms chats, any number of apps trying to render a nice looking preview of a posted url

thomasgauvin commented 1 year ago

Thanks for the feedback everyone. Based on this documentation https://developers.google.com/search/docs/crawling-indexing/javascript/dynamic-rendering, prerendering based on crawlers seems to no longer be the recommendation, and instead opting for a more long-term solution such as using server-side/static rendering is the recommended path. Using a framework like Nuxt/Next/SvelteKit provides this functionality which is more in line with what you are looking for, and these can be hosted on Static Web Apps.

We are still noting the feedback, and we may look into providing a more managed offering routing requests through some type of compute for prerendering before hitting Static Web Apps when we have more distributed compute options available.

elmalakai commented 1 year ago

Thomas, I'm fine closing this request because it has "prerender" in it's title, but this request was necessary for dynamic-rendering (server side) as well. We're currently limited on how with a SWA we can intercept the HTTP index.html GET, engage our backend API and return modified HTML. This is necessary to perform server-side dynamic rendering.

In order for my SWA URL to be richly shareable on social media, I have to wrap the React app in HTML with the appropriate meta tags. I currently have no way to do this, w/o standing up a full app service and handling it there.

Am I missing an easy way to rewrite a SWA React app which dynamically output OG metatag in the raw index.html based on the request? And if I could only do that for 'bot' request, even better, allowing the most used path to be the full SWA path, and the less used path the one with the dynamic OG tags.

Power-Maverick commented 1 year ago

+1 we need a dynamic rendering probably an option under staticwebapp.config.json