Closed cartogram closed 1 year ago
Really love where this is heading
I have a SEO requirement that I am hoping we can meet with this build. I want to make sure that
We have the exact same problem on Hydrogen 1 and having a list of bots to prevent content streaming isn't the way to go. The framework should not be keeping up with the ever changing and additions of user agents.
Issue on Hydrogen as reference:
Slack conversation around this topic:
Helen Lin [2022-10-05 05:18PM UTC]
Curious what @ryanflorence and @jacob.ebey
thinks about this …
I am constantly finding it difficult to draw the line between “what” to render on the first chunk of html. But ideally, this is what I would like to achieve.
loader - makes 2 query calls • SEO and above the fold contents (product title, description, first image) <-- cannot be delayed • The full product info query Idea is kinda like critical css but for data queries .. but then this makes 2 api calls
The reason why I have this thought is because there are still crawlers out there that won’t wait for javascript to finish .. sometimes not even waiting for response streaming finish. They will stop right after they found an instance of <meta property="og:url" />
.. and it is quite often that we have a placeholder value for that
but then the full product info query gets massive and would delay the generation of the entire response
[Slack Thread](https://shopify.slack.com/archives/C044LCP35TJ/p1664990319747069?thread_ts=1664990319.747069&cid=C044LCP35TJ) |
---|
**[Helen Lin](https://github.com/wizardlyhel)** [2022-10-05 05:18PM UTC] Curious what @ryanflorence and `@jacob.ebey` thinks about this … I am constantly finding it difficult to draw the line between “what” to render on the first chunk of html. But ideally, this is what I would like to achieve. loader - makes 2 query calls • SEO and above the fold contents (product title, description, first image) <-- cannot be delayed • The full product info query Idea is kinda like critical css but for data queries .. but then this makes 2 api calls The reason why I have this thought is because there are still crawlers out there that won’t wait for javascript to finish .. sometimes not even waiting for response streaming finish. They will stop right after they found an instance of `` .. and it is quite often that we have a placeholder value for that but then the full product info query gets massive and would delay the generation of the entire response |
**[Juan Pablo Prieto](https://github.com/juanpprieto)** [2022-10-05 05:30PM UTC] I was thinking about this too. Guess the strategy is to await a small SEO-specifc product query and defer another fuller product query |
**[Helen Lin](https://github.com/wizardlyhel)** [2022-10-05 05:32PM UTC] I’m also thinking to put longer cache control on the seo specific query |
**[Juan Pablo Prieto](https://github.com/juanpprieto)** [2022-10-05 05:35PM UTC] The challenge is that browser caching is per loader. I was playing around with server caching and splitting all product queries on this ~[branch](https://github.com/Shopify/h2-demo-store/blob/fe69035dddb8c625b6dd93f2391b901b6f01a9b5/app/routes/products/%24productHandle.tsx#L30)~ [here](https://github.com/Shopify/h2-demo-store/blob/80847651d8801e5b5ca8cb53f413e0fac76476de/app/routes/products/%24handle.tsx#L47) and comparing performance against the current `getFullProductData` |
**[Ryan Florence](https://github.com/ryanflorence)** [2022-10-07 04:49PM UTC] seems to me splitting queries by “above/below the fold” is less relevant than splitting up queries by volatility/cache-ability. With a SWR server-side caching strategy, fetch the whole product because that should come from cache 99.9% of the time (even when TTL is up because of swr), and then the volatile stuff you can put in a different query (like inventory) and defer it. |
**[Helen Lin](https://github.com/wizardlyhel)** [2022-10-07 04:50PM UTC] good point about the cache-ability |
👍 as long as the export function meta
is used, the content will be returned like normal tags AFAIK. HUGE benefit to Hydrogen v2 - no more competing with streaming boundaries, no more nesting <Head>
tags, no more dealing with SEO bot detection 🙃
Another thing that we've been discussing is the API of producing a combined object to describe the Seo information using the existing <Meta />
component (as is done in this PR currently), or to embrace the pattern of combining export const handle
, and useMatches
on a root-level component (as seen in the LDJson Schema component in this PR). This might look like a <Seo {...defaults} />
component that a user renders at the root, and individual routes could provide their own config via the handle
export.
Example:
/** root.tsx */
/** ... */
<head>
<Meta />
<Links />
/** render the component with some defaults */
<Seo
defaultTitle="Hydrogen"
titleTemplate="%s - Hydrogen"
omitGoogleBots
/>
</head>
/** $productHandle.tsx */
export const handle = {
seo: (data) => ({
/**
* Most routes would not need to define this because we would make a best-guess at the right Seo primitives to use.
* For example, if a particular piece of information requires a custom `<meta />` tag, a `<link />` component
* and/or a some LD-JSON key, we would handle that in the Seo component and accept any overrides here.
*/
title: `Custom title - ${data.seo.title}`,
disableTitleTemplate
}),
}
The benefit to this approach is that it gives users less to think about when it comes to Seo and can work to establish a good pattern for extending Remix that we may use in other aspects of Hydrogen.
I personally like the API of rendering a component at the root and feels like that is why the handle
and useMatches
exist. I think the combination could be pretty powerful, one could easily imagine an analytics
key and <Analytics />
component pair and even the dynamic share image example I put together above could mostly be abstracted in a <ShareImage {...defaultOgData} />
and a shareImage
key, etc...
The drawback might be that it does introduce multiple ways of doing something in Remix (namely adding meta/link tags). A simple lint rule and good docs could probably go a long way in preventing most of the confusion of where to add the Seo info.
cc @Shopify/hydrogen
I think the handle x root component
approach would really help us in creating a solid notion of what Hydrogen is and how we document what it is. eg: Analytics, Seo, LdJson, Scripts could all fall into this model.
nit: I think <Schema /> or <LdJson />
is often rendered at the end of the <bod />
. It might be better to keep it separate from <Seo />
. This could also make it easier for end users to opt-in/out and customize them independently
@Shopify/hydrogen This is still slight WIP, but is ready for another look. I'm now using the README.md file to describe what this PR is adding.
I would try giving it a run locally and looking at the PDP, changing some of the values on the handle.seo
object and using the debugger to see the results. All feedback is welcome at this point.
In addition to this PR, you can also checkout the companion PR's I've also added to support it.
I really like this - I can also see that SSR rendering of the meta tags fully working as well 🎉 NO MORE BOTS LIST!
We noticed that this PR either modifies or introduces usage of the dangerouslySetInnerHTML
attribute, which can cause cross-site scripting (XSS) vulnerabilities when user controlled values are passed in.
We recommend reviewing your code to ensure that this is what you intended to use and that there is not a safe alternative available.
Docs are available here.
If unavoidable, we reccomend using an HTML sanitizer like DOMPurify to sanitize content before rendering it as HTML.
If you have any questions or are unsure about how to move forward with this, ping #help-appsec and we would be happy to help you out! cc: @Shopify/xss-extermination-squad
@cartogram — I was positive I left a long comment on here but couldn't find it.
The TL:DR;
root.tsx
?
UPDATE: OCT 25, 2022
The below description is no longer fully relevant! use the README.md file in the changed files to learn about what this PR is adding.
This PR shows off some initial ideas around rendering SEO meta tags and Structure schema data for remix/hydrogen storefronts. This is a Work In Progress, but has the general ideas that I am leaning towards, described below in three parts:
The above
getProductSeo
function can be used in ameta
function on the Product Details Page, or the user is free to compose the individual functions as they want.I am also thinking that we could provide a completely turn-key root-level function for the
meta
export that uses location data to render the correct SEO helper at each route automatically, but left that off this PR as a potential next-step to explore. Interested in what level of abstraction we should be after here, and how much magic we want to cast.Schema
that is rendered at the root and outputs the script tag for schema data. Similar to the above, I've added very basic functions for producing the schema based on a given resource and these can be added to a route via thehandle
export.og-image
route and returning an svg response for the rendered component.All of the above is powered by storefront data.