fedeya / remix-sitemap

Sitemap generator for Remix applications
https://npmjs.com/remix-sitemap
MIT License
95 stars 5 forks source link

Sitemap works fine in browser, but not in google search console #40

Closed BryceBlankinship closed 1 year ago

BryceBlankinship commented 1 year ago

Hey there, great tool! I can view my sitemap in the browser, but when I request a live url test for submitting the sitemap manually to google search console URL Inspection Tool, it is saying 404 not found. Any ideas?

BryceBlankinship commented 1 year ago

Ok this may add more insight, when I load it from localhost, I get this error briefly in the terminal until the page reloads and it starts working again:

ErrorResponse { status: 404, statusText: 'Not Found', internal: true, data: 'Error: No route matches URL "/sitemap.xml"', error: Error: No route matches URL "/sitemap.xml" at getInternalRouterError (/Users/bryce/Documents/GitHub/mellowdepot/web/my-remix-app/node_modules/@remix-run/router/router.ts:4087:5) at Object.query (/Users/bryce/Documents/GitHub/mellowdepot/web/my-remix-app/node_modules/@remix-run/router/router.ts:2592:19) at handleDocumentRequestRR (/Users/bryce/Documents/GitHub/mellowdepot/web/my-remix-app/node_modules/@remix-run/server-runtime/dist/server.js:163:35) at requestHandler (/Users/bryce/Documents/GitHub/mellowdepot/web/my-remix-app/node_modules/@remix-run/server-runtime/dist/server.js:61:24) at /Users/bryce/Documents/GitHub/mellowdepot/web/my-remix-app/node_modules/@remix-run/express/dist/server.js:39:28 at processTicksAndRejections (node:internal/process/task_queues:95:5) }

This is leading me to believe that the crawlers at google aren't waiting for the reload, and the reload is happening due to the data fetching I am doing. I guess this won't work for data fetches that take longer than a couple ms.

fedeya commented 1 year ago

HI @BryceBlankinship can you create a repo to reproduce the error? Thanks

BryceBlankinship commented 1 year ago

I figured out the issue. The newer remix uses two different request handlers, one for bots and one for browser requests. You should update the documentation to include v2 for entry.server.js. I was putting the isSitemapUrl function in the browser request when it needs to be in both browser and bot requests, hence why it was showing up on my browser but google and other tools couldn't find it with their bots. See example (note im using mantine for styles so theres some extra stuff):

import { PassThrough } from "stream";

import { Response } from "@remix-run/node";
import { RemixServer } from "@remix-run/react";
import isbot from "isbot";
import { renderToPipeableStream } from "react-dom/server";

import { renderToString } from 'react-dom/server';
import { injectStyles, createStylesServer } from '@mantine/remix';

import { createSitemapGenerator } from 'remix-sitemap';

const { isSitemapUrl, sitemap } = createSitemapGenerator({
  siteUrl: 'https://example.com',
  generateRobotsTxt: true
})

const server = createStylesServer();

const ABORT_DELAY = 5000;

export default function handleRequest(
  request,
  responseStatusCode,
  responseHeaders,
  remixContext
) {
  return isbot(request.headers.get("user-agent"))
    ? handleBotRequest(
      request,
      responseStatusCode,
      responseHeaders,
      remixContext
    )
    : handleBrowserRequest(
      request,
      responseStatusCode,
      responseHeaders,
      remixContext
    );
}

function handleBotRequest(
  request,
  responseStatusCode,
  responseHeaders,
  remixContext
) {
  return new Promise((resolve, reject) => {
    if (isSitemapUrl(request)) {
      return resolve(sitemap(request, remixContext));
    }
    let didError = false;

    const { pipe, abort } = renderToPipeableStream(
      <RemixServer context={remixContext} url={request.url} />,
      {
        onAllReady() {
          const body = new PassThrough();

          responseHeaders.set("Content-Type", "text/html");

          resolve(
            new Response(body, {
              headers: responseHeaders,
              status: didError ? 500 : responseStatusCode,
            })
          );

          pipe(body);
        },
        onShellError(error) {
          reject(error);
        },
        onError(error) {
          didError = true;

          console.error(error);
        },
      }
    );

    setTimeout(abort, ABORT_DELAY);
  });
}

function handleBrowserRequest(
  request,
  responseStatusCode,
  responseHeaders,
  remixContext
) {
  return new Promise((resolve, reject) => {
    if (isSitemapUrl(request)) {
      return resolve(sitemap(request, remixContext));
    }

    let didError = false;

    let markup = renderToString(<RemixServer context={remixContext} url={request.url} />);

    const { pipe, abort } = renderToPipeableStream(
      <RemixServer context={remixContext} url={request.url} />,
      {
        onShellReady() {
          const body = new PassThrough();

          responseHeaders.set("Content-Type", "text/html");

          resolve(
            new Response(`<!DOCTYPE html>${injectStyles(markup, server)}`, {
              headers: responseHeaders,
              status: didError ? 500 : responseStatusCode,
            })
          );

          pipe(body);
        },
        onShellError(err) {
          reject(err);
        },
        onError(error) {
          didError = true;

          console.error(error);
        },
      }
    );

    setTimeout(abort, ABORT_DELAY);
  });
}
fedeya commented 1 year ago

@BryceBlankinship in this case the functions must be at the beginning of the handleRequest function

example:

export default function handleRequest(
  request: Request,
  responseStatusCode: number,
  responseHeaders: Headers,
  remixContext: EntryContext,
  loadContext: AppLoadContext
) {
  if (isSitemapUrl(request)) {
    return sitemap(request, remixContext);
  }

  return isbot(request.headers.get("user-agent"))
    ? handleBotRequest(
      request,
      responseStatusCode,
      responseHeaders,
      remixContext
    )
    : handleBrowserRequest(
      request,
      responseStatusCode,
      responseHeaders,
      remixContext
    );
}
BryceBlankinship commented 1 year ago

good point, I moved it

idroid007 commented 7 months ago

Hey @BryceBlankinship , i am having same problem. this is my script.


  import { PassThrough } from "node:stream";
  import { createSitemapGenerator } from "remix-sitemap";
  import type { AppLoadContext, EntryContext } from "@remix-run/node";
  import { Response } from "@remix-run/node";
  import { RemixServer } from "@remix-run/react";
  import isbot from "isbot";
  import { renderToPipeableStream } from "react-dom/server";

  const ABORT_DELAY = 5_000;
  const { isSitemapUrl, sitemap } = createSitemapGenerator({
    siteUrl: "https://sticky-notes.cc",
    generateRobotsTxt: true,
    // configure other things here
  });

  export default async function handleRequest(
    request: Request,
    responseStatusCode: number,
    responseHeaders: Headers,
    remixContext: EntryContext,
    loadContext: AppLoadContext
  ) {
    if (isSitemapUrl(request)) return await sitemap(request, remixContext);

    return isbot(request.headers.get("user-agent"))
      ? handleBotRequest(
          request,
          responseStatusCode,
          responseHeaders,
          remixContext
        )
      : handleBrowserRequest(
          request,
          responseStatusCode,
          responseHeaders,
          remixContext
        );
  }

  function handleBotRequest(
    request: Request,
    responseStatusCode: number,
    responseHeaders: Headers,
    remixContext: EntryContext
  ) {
    return new Promise(async (resolve, reject) => {
      let shellRendered = false;

      const { pipe, abort } = renderToPipeableStream(
        <RemixServer
          context={remixContext}
          url={request.url}
          abortDelay={ABORT_DELAY}
        />,
        {
          onAllReady() {
            shellRendered = true;
            const body = new PassThrough();

            responseHeaders.set("Content-Type", "text/html");

            resolve(
              new Response(body, {
                headers: responseHeaders,
                status: responseStatusCode,
              })
            );

            pipe(body);
          },
          onShellError(error: unknown) {
            reject(error);
          },
          onError(error: unknown) {
            responseStatusCode = 500;
            // Log streaming rendering errors from inside the shell.  Don't log
            // errors encountered during initial shell rendering since they'll
            // reject and get logged in handleDocumentRequest.
            if (shellRendered) {
              console.error(error);
            }
          },
        }
      );

      setTimeout(abort, ABORT_DELAY);
    });
  }

  function handleBrowserRequest(
    request: Request,
    responseStatusCode: number,
    responseHeaders: Headers,
    remixContext: EntryContext
  ) {
    return new Promise(async (resolve, reject) => {
      let shellRendered = false;

      const { pipe, abort } = renderToPipeableStream(
        <RemixServer
          context={remixContext}
          url={request.url}
          abortDelay={ABORT_DELAY}
        />,
        {
          onShellReady() {
            shellRendered = true;
            const body = new PassThrough();

            responseHeaders.set("Content-Type", "text/html");

            resolve(
              new Response(body, {
                headers: responseHeaders,
                status: responseStatusCode,
              })
            );

            pipe(body);
          },
          onShellError(error: unknown) {
            reject(error);
          },
          onError(error: unknown) {
            responseStatusCode = 500;
            // Log streaming rendering errors from inside the shell.  Don't log
            // errors encountered during initial shell rendering since they'll
            // reject and get logged in handleDocumentRequest.
            if (shellRendered) {
              console.error(error);
            }
          },
        }
      );

      setTimeout(abort, ABORT_DELAY);
    });
  }

still, its not working with google search console, thru browser it is working fine.

BryceBlankinship commented 7 months ago

I'll get back to you soon, I think it has to do with https configuration I can't remember so I'll check my code and see if I can remember what the issue was.