Shopify / hydrogen

Hydrogen lets you build faster headless storefronts in less time, on Shopify.
https://hydrogen.shop
MIT License
1.24k stars 246 forks source link

Seo #80

Closed cartogram closed 1 year ago

cartogram commented 1 year ago

UPDATE: OCT 25, 2022

The below description is no longer fully relevant! use the README.md file in the changed files to learn about what this PR is adding.


This PR shows off some initial ideas around rendering SEO meta tags and Structure schema data for remix/hydrogen storefronts. This is a Work In Progress, but has the general ideas that I am leaning towards, described below in three parts:

  1. For the meta tags, I've created small, composable functions for generating the meta tags for a specific topic, purpose or platform, while also wrapping those up in a function for a specific resource. For example:
export function getProductSeo(data: Data): HtmlMetaDescriptor {
  return {
    ...getPageMeta(data),
    ...getTwitterMeta(data),
    ...getOpenGraphMeta(data),
    ...getRobotsMeta(data),
  };
}

The above getProductSeo function can be used in a meta function on the Product Details Page, or the user is free to compose the individual functions as they want.

// $productHandle.tsx

export const meta = getProductSeo;

I am also thinking that we could provide a completely turn-key root-level function for the meta export that uses location data to render the correct SEO helper at each route automatically, but left that off this PR as a potential next-step to explore. Interested in what level of abstraction we should be after here, and how much magic we want to cast.

// root.tsx

export const meta: MetaFunction = (data) => {
  const seo = getDefaultSeo(data)
  return {
    ...seo,
    charset: "utf-8",
    viewport: "width=device-width,initial-scale=1",
  };
};
  1. For the structure data, I created a new component called Schema that is rendered at the root and outputs the script tag for schema data. Similar to the above, I've added very basic functions for producing the schema based on a given resource and these can be added to a route via the handle export.
export const handle = {
  schema: getProductSchema,
};
  1. For the shareable image, we provide users a component to render (and cache) as an svg by adding an og-image route and returning an svg response for the rendered component.

All of the above is powered by storefront data.

benjaminsehl commented 1 year ago

Really love where this is heading

wizardlyhel commented 1 year ago

I have a SEO requirement that I am hoping we can meet with this build. I want to make sure that

Screen Shot 2022-10-17 at 11 27 05 AM

We have the exact same problem on Hydrogen 1 and having a list of bots to prevent content streaming isn't the way to go. The framework should not be keeping up with the ever changing and additions of user agents.

Issue on Hydrogen as reference:

Slack conversation around this topic:

Helen Lin [2022-10-05 05:18PM UTC]
Curious what @ryanflorence and @jacob.ebey thinks about this …

I am constantly finding it difficult to draw the line between “what” to render on the first chunk of html. But ideally, this is what I would like to achieve.

loader - makes 2 query calls • SEO and above the fold contents (product title, description, first image) <-- cannot be delayed • The full product info query Idea is kinda like critical css but for data queries .. but then this makes 2 api calls

The reason why I have this thought is because there are still crawlers out there that won’t wait for javascript to finish .. sometimes not even waiting for response streaming finish. They will stop right after they found an instance of <meta property="og:url" /> .. and it is quite often that we have a placeholder value for that

but then the full product info query gets massive and would delay the generation of the entire response

:thread: Slack Thread
[Slack Thread](https://shopify.slack.com/archives/C044LCP35TJ/p1664990319747069?thread_ts=1664990319.747069&cid=C044LCP35TJ)
**[Helen Lin](https://github.com/wizardlyhel)** [2022-10-05 05:18PM UTC]
Curious what @ryanflorence and `@jacob.ebey` thinks about this … I am constantly finding it difficult to draw the line between “what” to render on the first chunk of html. But ideally, this is what I would like to achieve. loader - makes 2 query calls • SEO and above the fold contents (product title, description, first image) <-- cannot be delayed • The full product info query Idea is kinda like critical css but for data queries .. but then this makes 2 api calls The reason why I have this thought is because there are still crawlers out there that won’t wait for javascript to finish .. sometimes not even waiting for response streaming finish. They will stop right after they found an instance of `` .. and it is quite often that we have a placeholder value for that but then the full product info query gets massive and would delay the generation of the entire response
**[Juan Pablo Prieto](https://github.com/juanpprieto)** [2022-10-05 05:30PM UTC]
I was thinking about this too. Guess the strategy is to await a small SEO-specifc product query and defer another fuller product query
**[Helen Lin](https://github.com/wizardlyhel)** [2022-10-05 05:32PM UTC]
I’m also thinking to put longer cache control on the seo specific query
**[Juan Pablo Prieto](https://github.com/juanpprieto)** [2022-10-05 05:35PM UTC]
The challenge is that browser caching is per loader. I was playing around with server caching and splitting all product queries on this ~[branch](https://github.com/Shopify/h2-demo-store/blob/fe69035dddb8c625b6dd93f2391b901b6f01a9b5/app/routes/products/%24productHandle.tsx#L30)~ [here](https://github.com/Shopify/h2-demo-store/blob/80847651d8801e5b5ca8cb53f413e0fac76476de/app/routes/products/%24handle.tsx#L47) and comparing performance against the current `getFullProductData`
**[Ryan Florence](https://github.com/ryanflorence)** [2022-10-07 04:49PM UTC]
seems to me splitting queries by “above/below the fold” is less relevant than splitting up queries by volatility/cache-ability. With a SWR server-side caching strategy, fetch the whole product because that should come from cache 99.9% of the time (even when TTL is up because of swr), and then the volatile stuff you can put in a different query (like inventory) and defer it.
**[Helen Lin](https://github.com/wizardlyhel)** [2022-10-07 04:50PM UTC]
good point about the cache-ability
jplhomer commented 1 year ago

👍 as long as the export function meta is used, the content will be returned like normal tags AFAIK. HUGE benefit to Hydrogen v2 - no more competing with streaming boundaries, no more nesting <Head> tags, no more dealing with SEO bot detection 🙃

cartogram commented 1 year ago

Another thing that we've been discussing is the API of producing a combined object to describe the Seo information using the existing <Meta /> component (as is done in this PR currently), or to embrace the pattern of combining export const handle, and useMatches on a root-level component (as seen in the LDJson Schema component in this PR). This might look like a <Seo {...defaults} /> component that a user renders at the root, and individual routes could provide their own config via the handle export.

Example:

/** root.tsx */

/**  ... */
<head>
  <Meta />
  <Links />
   /** render the component with some defaults */
  <Seo
    defaultTitle="Hydrogen"
    titleTemplate="%s - Hydrogen" 
    omitGoogleBots 
  />
</head>

/** $productHandle.tsx */

export const handle = {
  seo: (data) => ({
    /**
    * Most routes would not need to define this because we would make a best-guess at the right Seo primitives to use.
    * For example, if a particular piece of information requires a custom `<meta />` tag, a `<link />` component 
    * and/or a some LD-JSON key, we would handle that in the Seo component and accept any overrides here.
    */
    title: `Custom title - ${data.seo.title}`,
    disableTitleTemplate
  }),
}

The benefit to this approach is that it gives users less to think about when it comes to Seo and can work to establish a good pattern for extending Remix that we may use in other aspects of Hydrogen.

I personally like the API of rendering a component at the root and feels like that is why the handle and useMatches exist. I think the combination could be pretty powerful, one could easily imagine an analytics key and <Analytics /> component pair and even the dynamic share image example I put together above could mostly be abstracted in a <ShareImage {...defaultOgData} /> and a shareImage key, etc...

The drawback might be that it does introduce multiple ways of doing something in Remix (namely adding meta/link tags). A simple lint rule and good docs could probably go a long way in preventing most of the confusion of where to add the Seo info.

cc @Shopify/hydrogen

juanpprieto commented 1 year ago

I think the handle x root component approach would really help us in creating a solid notion of what Hydrogen is and how we document what it is. eg: Analytics, Seo, LdJson, Scripts could all fall into this model.

nit: I think <Schema /> or <LdJson /> is often rendered at the end of the <bod />. It might be better to keep it separate from <Seo />. This could also make it easier for end users to opt-in/out and customize them independently

cartogram commented 1 year ago

@Shopify/hydrogen This is still slight WIP, but is ready for another look. I'm now using the README.md file to describe what this PR is adding.

I would try giving it a run locally and looking at the PDP, changing some of the values on the handle.seo object and using the debugger to see the results. All feedback is welcome at this point.

In addition to this PR, you can also checkout the companion PR's I've also added to support it.

wizardlyhel commented 1 year ago

I really like this - I can also see that SSR rendering of the meta tags fully working as well 🎉 NO MORE BOTS LIST!

caution-tape-bot[bot] commented 1 year ago

We noticed that this PR either modifies or introduces usage of the dangerouslySetInnerHTML attribute, which can cause cross-site scripting (XSS) vulnerabilities when user controlled values are passed in. We recommend reviewing your code to ensure that this is what you intended to use and that there is not a safe alternative available.

Docs are available here.

If unavoidable, we reccomend using an HTML sanitizer like DOMPurify to sanitize content before rendering it as HTML.

If you have any questions or are unsure about how to move forward with this, ping #help-appsec and we would be happy to help you out! cc: @Shopify/xss-extermination-squad

_View the source of this rule in Services DB_

benjaminsehl commented 1 year ago

@cartogram — I was positive I left a long comment on here but couldn't find it.

The TL:DR;